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PREFACE 
Abstract 

SETL,  like  other  very  high  level  languages,  emphasizes 
the  use  of  abstract  data  structures  such  as  sets  and  maps. 
Efficient  data  structuring  of  a  SETL  program  is  obtained  by 
supplying  to  the  language  processor  detailed  descriptions  of 
structural  relations  among  program  variables.  These  descrip- 
tions center  around  the  concept  of  'basings'.   The  use  of 
basing  declarations  leads  to  efficient  data  layouts  for  the 
abstract  structures  of  SETL.   The  concept  of  basings  suggests 
techniques  for  choosing  efficient  data  structures  automati- 
cally by  means  of  program  analysis.   The  basing  notion  thus 
provides  a  unifying  framework  for  automatic  data  structure 
choice.   This  paper  describes  a  design  for  a  demonstration 
system  constructed  within  this  framework.   The  manner  in  which 
such  a  system  would  apply  to  typical  examples  is  illustrated. 
A  SETL  specification  of  the  proposed  automatic  data  structure 
choice  scheme  is  also  presented. 
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CHAPTER  1  :  INTRODUCTION 

Research  in  automatic  programming  has  over  the  years 
produced  a  number  of  increasingly  pouerful  tools  for  the 
writers  of  software  :  symbolic  assemblers,  macro 
assemblers,  text  editors,  debugging  systems  and 
algorithmic  languages  of  greater  and  greater  expressive 
power  and  conciseness.   Ongoing  efforts  to  design  very 
high  level  languages,  which  incorporate  pouerful 
mathematical  primitives  and  abstract  information  structure 
e.g.,  sets  and  relations,  are  one  of  the  most  important 
current  aspects  of  this  research.   At  the  present  time,  a 
number  of  compiler/interpreter  systems  for  languages  of 
this  level,  such  as  PLANNER,  MADCAP,  VERS2,  CONNIVER,  LEAP 
and  SETL,  have  already  been  developed.   Results  so  far 
with  such  languages  indicate  that  they  do  indeed  greatly 
reduce  programming  effort,  allow  the  programmer  to  tackle 
problems  that  would  be  intractable  with  lower  level 
languages,  and  simplify  the  production  of  software. 

However,  until  now,  most  efforts  in  this  area  have 
concentrated  on  language  design  and  basic  implementation  ; 
little  has  yet  been  done  to  achieve  efficient  program 
execution.   The  inefficiency  of  existing  implementations 
of  languages  of  this  level  has  been  severe  enough  to 
restrict  their  use  to  a  few  research  environments.   This 
inefficiency  mainly  arises  from  two  sources  :  generation 


of  unoptimized  code  and  use  of  ill-chosen  data  structures. 

By  providing  powerful  semantic  primitives  and  a 
comfortable  syntax  for  combining  tliese  primitives,  very 
high  level  languages  can  tempt  the  user  into  a  style  of 
programming  which  is  highly  inefficient  if  unoptimized. 
For  example,  in  SETL,  the  task  of  determining  the  number 
of  positive  integers  in  a  set  S  can  be  written  : 


N  :=  t{  XGS  I  X>0  }  ; 


(  1  ) 


Taken  literally,  this  involves  the  construction  of  a  set 
for  the  sole  purpose  of  finding  its  cardinality.   The  loop 


N  :=  0  ;  (Vxes  I  X>0)  N  :=N+  1  ;; 


(2) 


clearly  achieves  the  same  effect  at  a  smaller  cost.   Yet  a 
case  can  be  made  that  ( 1 ) ,  and  the  style  of  programming  it 
embodies,  represents  a  good  use  of  the  language.   The 
intent  of  (1)  is  more  apparent  than  that  of  (2);  (1)  is 
expressed  more  concisely,  and  can  in  fact  be  viewed  as  a 
specification  for  the  'lower-level'  code  fragment  (2). 

Similarly,  SETL  code  to  create  the  subset  of  positive 
numbers  and  the  subset  of  negative  numbers  in  a  set  S 
might  be  written  as  J 

POS  :=  {  X€S  I  X>0}  ;    NEG  :=  {  Xes  I  X<0}  ; 


The  set  S  has  to  be  scanned  over  twice  if  the  source  cod( 


is  interpreted  or  compiled  directly  without  optimisation  ; 
which  is  certainly  more  inefficient  than  the  code  written 
in  the  lower  level  style  : 

POS  :=  nl;   NEG  :=  nl  ; 

(VX€S)  if  X  >  0  then  POS  with  X  ; 

elseif  X  <  0  then  NEG  with  X  ;  ;   end  VX  ; 

These  two  examples  show  that  the  realisation  of  a 
program  written  in  a  style  fully  utilising  a  very  high 
level  language  can  never  achieve  reasonable  efficiency 
without  a  powerful  optimiser.   The  creation  of  redundant 
objects  and  the  redundant  looping  through  composite 
objects  should  in  particular  be  avoided.   Traditional  code 
optimisation  techniques  are  in  general  useful  for  this 
purpose.   Some  of  the  prior  work  in  this  area  is  briefly 
reviewed  in  the  section  2  of  this  chapter. 


A  second  main  source  of  inefficiency  lies  in  the  fact 
that,  at  the  implementation  level,  the  run-time  support 
library,  which  realises  the  high  level  semantic  constructs 
of  a  very  high  level  language,  must  use  'general' 
structures,  which  can  support,  with  roughly  even 
efficiency,  all  the  various  operations  likely  to  be 
applied  to  the  data  objects  of  the  various  types  provided 
by  the  language.   Almost  invariably,  this  general 
structure  will  not  be  the  best  choice  for  a  specific 
application.   SETL,  for  instance,  realizes  a  set  by  a  hash 


table  structure.   Obviously,  this  is  not  the  best 
structure  for  sets  uhich  are  only  subject  to  algebraic 
operations  such  as  union  and  intersection.   In  this  case, 
the  use  of  bit  string  structures  uould  be  more 
appropriate.   It  is  clear  that  to  overcome  this  difficulty 
the  language  processor  itself  must  have  the  capability  of 
choosing  efficient  data  structures  to  represent  the 
abstract  objects  of  the  program  and  code  sequences  to 
realize  the  abstract  operations  to  be  performed  on  these 
objects.   To  accomplish  this,  we  require  a  so-called 
automatic  data  structure  choice  system  (subsequently 
denoted  by  'ADSC  system'). 


The  research  reported  here  is  an  attempt  to 
demonstrate  the  feasibility  of  building  such  an  ADSC 
system.   We  have  designed  a  demonstration  system,  based  on 
the  notion  of  'basing'  (to  be  explained  in  the  next 
chapter),  to  automatically  choose  data  structures  for 
variables  of  SETL  programs.   Since  our  system  utilizes  the 
information  collected  by  other  optimization  techniques,  it 
was  designed  as  the  final  phase  of  a  powerful  SETL 
optimizer  which  incorporates  a  wide  variety  of  useful,  but 
better-established,  optimization  techniques.   Though  we 
have  considered  only  a  few  abstract  data  structures  which 
are  available  in  SETL,  we  believe  that  the  techniques  used 
in  the  proposed  ADSC  system  are  generally  applicable. 


1  .  1  Automatic  Data  Structure  Choice  System 

A  reasonable  approach  to  an  ADSC  system  can  be  as 
follows  : 

(1)   First,  ue  have  to  choose  a  basic  family  of  data 
structures.   In  attempting  to  regularise  the  process  of 
data  structure  choice  ue  do  not  ask  whether  all,  or  even 
many,  of  the  data  structures  that  might  be  used  by  an 
experienced  programmer  can  be  duplicated  automatically. 
Rather,  we  try  to  find  some  narrow  subfamily  of  the  family 
of  all  possible  data  structure  choices,  doing  this  in  a 
way  which  guarantees  that  some  choice  in  our  subfamily  is 
an  adequate  replacement  for  any  choice  which  a  programmer 
is  likely  to  make.   There  is  no  doubt  that  a  poor  initial 
choice  of  subfamily  may  have  a  severely  deleterious  effect 
on  the  execution  efficiency  of  the  operations  defined  in 
the  language.   Choosing  a  proper  family  of  data  structures 
is  therefore  quite  important.   In  making  this  choice,  we 
must  consider  at  least  the  operations  to  be  applied  to 
each  of  our  abstract  structures,  the  relative  frequencies 
of  these  operations,  and  the  relative  importance  of 
conserving  time  versus  space. 


(2)   Having  chosen  a  basic  family  of  data  structures,  we 
can  estimate  the  execution  speed  of  each  possible 
operation  on  these  different  data  structures.   This 


provides  fundamental  information  about  each  individual 
structure  that  is  considered  in  the  process  of  making  an 
eventual  data  structure  choice.   A  difficulty  in  this 
process  arises  from  the  fact  that  the  execution  speed  of 
an  operation  may  be  a  function  both  of  the  data  structures 
of  its  operands  and  of  the  expected  size  of  each  operand. 
These  sizes  are  not  deducible  directly  from  the  program 
text.   Inhomogeneity  of  the  components  of  a  composite 
object  also  makes  speed  problems  unsolvable  in  some  cases. 
Hence  some  kind  of  approximation  has  to  be  applied  at  this 
step.   However,  this  approximation  should  not  blur  our 
analysis  so  much  that  we  are  not  able  to  distinguish 
betueen  different  data  structures. 

(3)  Next,  we  have  to  construct  a  fact  collector  to 
analyze  source  programs.   The  fact  collector  must  be  able 
to  derive  the  information  which  is  relevant  to  the 
structure  choice  algorithm  that  we  intend  to  use. 
Standard  global  program  analysis  techniques  such  as 
interval  analysis,  data  flow  analysis,  value  flow  analysis 
and  plausible  relation  deduction  techniques,  etc.,  are  all 
valuable  here. 

(4)  As  a  final  step,  we  must  design  a  structure  choice 
algorithm.   This  is  the  heart  of  an  ADSC  system.   Here, 
the  speed  functions  or  tlie  characteristics  of  data 
structures  developed  in  the  previous  step  and  gathered 
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'facts'  concerning  each  individual  program  must  be  used  in 
order  to  minimize  the  total  cost  of  running  the  program. 
Of  course,  the  design  of  such  a  decision-making  algorithm 
will  be  somuhat  ad  hoc  due  to  the  fact  that  the  facts 
which  enter  into  good  data  structure  choice  are  not  all 
accessible  to  a  program  analysis  routine.   We  may  have  to 
make  some  crudely  empirical  decisions,  seeking  guidance  in 
this  by  manually  translating  a  representative  variety  of 
algorithms  written  in  source  language  into  equivalent  but 
efficient  codes  in  the  target  lower  level  language.   If 
such  translation  is  done  as  systematically  as  possible  and 
in  a  highly  'self-conscious'  way,  then  by  noting  the  facts 
which  repeatedly  appear  relevant  in  the  process  reasonably 
good  mechanical  data  structure  choice  algorithms  may  be 
achieved . 


During  the  design  of  our  ADSC  system,  we  have  used 
this  approach.   A  first  step  in  our  approach  was  to 
introduce  a  'basing  system',  which  in  effect  defines  the 
subfamily  of  data  structures  with  which  we  work. 
Initially,  a  declaration  language  which  allows  raannual 
definitions  of  basings  was  provided  ;  see 
Schwartzl  1971a, 1971b  I  and  Schwartz [  1 976a ,  1  976b  1  .   This 
gave  an  extremely  useful  tool  to  experiment  with  data 
structure  choice  and  to  explore  various  styles  of  choice. 
Then,  exploiting  this  experience,  we  have  designed  our 
automatic  basing  choice  system.   Chapter  2  of  this  thesis 


will  present  the  'basing  system'  on  uhich  our  work  rests, 
while  chapter  3  will  describe  the  automatic  data  choice 
algorithm. 

The  SETL  optimizer  has  been  assumed  to  be  available  as 
the  program  analysis  component  of  our  system.   This  gives 
us  the  global  information  required  by  our  ADSC  system.   In 
this  thesis,  no  detailed  discussion  about  the  analysis 
techniques  used  in  this  optimizer  will  be  given,  see 
however  Vanek  I  1  976b  ] ,  Sharirl  19771  and  GrandM978). 


1  .2  Related  MorH 

In  this  section  we  will  review  prior  work  related  to 
our  research. 

Tenenbauml  1 974  1  shows  how  to  analyze  SETL  programs 
automatically  to  determine  the  types  of  the  variables  used 
in  them.   For  this  purpose,  he  builds  a  data  type  lattice 
with  'conjunction'  and  'disjunction'  operations.   A 
revised  version  of  this  data  type  finder  is  described  by 
Vanek [  1 976a  1  .   The  automatic  data  structure  choice  system 
presented  in  this  paper  will  utilize  the  information 
provided  by  such  an  automatic  'type  finder'. 


The  two  part  paper  by  Schwartz  I  1 975c  I  introduces  a 

number  of  analysis  and  optimization  techniques  for  general 
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very  high  level  languages,  in  particular  for  SETL .   One  of 
these  techniques  is  value  flow  analysis,  which  generalizes 
conventional  data  flow  analysis  to  structured  data 
objects.   A  second  technique  introduced  in  this  paper  is 
copy  optimisation,  which  allows  us  to  derive  criteria 
under  which  we  can  modify  existing  data  objects  in  place 
rather  than  generate  new  copies  when  they  are  logically 
modified.   A  related  copy  optimization  technique  first 
described  by  DewarI1977]  has  been  implemented  in  the 
current  SETL  optimizer.   Finally,  a  technique  of  inclusion 
and  membership  determination  which  can  provide  valuable 
information  to  data  structure  choice  was  described. 

Low[197'4]  describes  an  interesting  approach  to 
selection  of  representations  for  particular  data  types 
within  the  framwork  of  a  small  set  of  pre-selected  data 
structure  alternatives.   The  choice  is  made  via  a  cost 
analysis  of  each  alternative  for  the  operations  actually 
performed  in  the  program.   The  cost  of  a  program  is 
calculated  in  terms  of  execution  time  and  required  space. 
Combinatorial  explosion  in  the  calculation  of  program  cost 
is  avoided  by  insisting  that  all  variables  (of  the  same 
type)  subject  to  a  single  operation  have  the  same 
representation  structure.   Low[  19781  enhances  this  idea 
and  gives  an  example  and  overview.   A  shortcoming  in  this 
approach  is  that  logical  relationships  among  program 
variables,  which  should  play  an  important  role  in  the 
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process  to  choose  proper  data  structures  for  program 
variables,  are  completely  ignored. 

Bruce[1976]  describes  an  APL  optimizer  incorporating 
various  established  optimisation  techniques.   The  program 
performance  benefit  of  various  possible  transformations  is 
estimated.   Determination  of  a  realisation  of  the  target 
program  which  gains  maximum  benefit  is  attempted. 
Combinatorial  explosion  difficulties  in  this  process  are 
avoided  by    a  look-ahead  scheme.   The  uork  of  this 
optimiser  is  structured  to  avoid  the  creation  of  uncessary 
temporaries,  in  particular  temporaries  with  large 
aggregate  values.   Loop  optimisation  techniques  such  as 
loop  paralleling,  loop  switching  and  loop  jamming  are 
employed.   A  simplified  copy  optimisation  principle  which 
allows  arrays  to  be  used  destructively  is  also  included. 
No  attempt  to  select  optimal  data  structures  is  made. 

Rovner[1976]  has  extended  Lou's  work  to  finding 
implementations  for  associative  data  structures  and 
accesses.   Redundant  representations  for  data  structures 
are  allowed.   Rovner  uses  Lou's  hill  climbing  approach. 
Some  additional  heuristics  about  cost  trade-off  are  added 
to  the  applicable  conditions. 


Kantl1977]  describes  a  system  LIBRA  which  aims  to 
identify  an  efficient  set  of  implementations  for  abstract 
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constructs  in  a  very  high  level  program.   The  high  level 
constructs  are  realized  by  step-by-step  refinement  of 
partially  refined  programs  (called  program  descriptions). 
Within  this  framuork,  three  basic  issuses  are  addressed  ■ 
picking  a  program  description  to  refine,  picking  a 
particular  statement  or  group  of  statements  to  refine  and 
selecting  a  refinement  for  the  chosen  statement.   At  each 
level  of  selection,  a  set  of  heuristic  rules  is  applied 
first,  then  follous  a  cost  analysis.   If  the  outcome  of 
cost  analysis  is  not  clear,  seperate  program  descriptions 
are  set  up  to  test  several  possibilities.   The  refinement 
process  continues  until  a  program  in  the  target  language 
is  generated.   In  this  system,  multiple  representations 
and  the  use  of  different  representations  for  a  variable  in 
different  parts  of  a  program  are  allowed. 


1 . 3  Review  of  Salient  Features  of  the  SETL  Language 

Before  starting  to  describe  our  system,  we  shall 
review  some  of  the  important  features  of  the  language, 
SETL,  that  we  are  going  to  deal  with.   A  general  survey  of 
this  language  is  given  by  Kennedy  and  Schwartz!  1 975  ]  .   For 
a  detailed  account,  see  Schwartz  I  1 97 3  1  .   Several  semantic 
and  syntatic  changes  made  recently  in  the  new  version  of 
the  language  are  summarized  in  Schonberg I  1  976  ]  .   For 
complete  language  description  and  programming  reference, 
see  Dewarl  1975].   Appendix  A  lists  most  of  the  primitive 
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operations  of  SETL,  and  shous  the  syntactic  form  in  which 
they  are  written. 

SETL  is  a  very  high  level,  general  purpose  language 
based  upon  the  dictions  and  semantic  notions  of  the  theory 
of  sets.   Both  atomic  and  composite  data  types  are 
supported.   Atomics  include  the  data  types  commonly  found 
in  most  programming  languages,  such  as  integers,  reals, 
bit  strings  of  which  boolean  values  are  a  special  case, 
character  strings,  subroutines  and  functions. 

Sets  and  tuples  are  two  basic  composite  data  types. 
Sets  are  the  important  objects  in  the  language,  whose  uses 
characterize  the  semantics  of  the  language.   A  set  is  an 
unordered  finite  collection  of  distinct  SETL  objects,  and 
thys  may  contain  atoms,  tuples  and  other  sets.   Sets  may 
be  formed  by  enumeration,  e.g.  {1,2,3},  or  by  using  a 
general  set-former  construction.   The  general  set-former 
has  the  form 

{  EXP  :  RANGE  I  COND  } 

where  EXP  is  a  genral  expression,  RANGE  describes  the 
iterative  operation  which  calculates  successive  values  of 
EXP,  and  COND  specifies  which  of  these  values  shall 
actually  become  members  of  the  set  being  built. 

A  tuple  is  an  ordered  sequence  of  SETL  objects.   Two 
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tuples  are  equal  if  all  their  components  are  equal. 
Tuples,  like  sets,  are  built  by  explicit  enumeration  or  by 
means  of  tuple-former  expression, 

[  EXP  :  RANGE  I  COND  ] 

which  is  analogous  to  a  set-former  expression.  Sets  and 
tuples  can  be  nested  freely  as  components  and  members  of 
each  other  to  any  depth  and  can  be  entirely  inhomogeneous . 

Maps,  which  are  special  types  of  sets,  play  a  special 
role.   Maps  are  functions  in  the  sense  of  set  theory,  i.e. 
sets  of  ordered  pairs  [X,Y)  (i.e.  tuples  of  length  of  2), 
where  X  is  an  element  of  the  domain  of  F,  and  Y  is  the 
corresponding  element  in  the  range.   SETL  allows  such  sets 
of  pairs  to  be  used  as  'tabular  functions'  or  relations. 
Maps  can  be  manipulated  in  terms  of  their  set  structure, 
i.e.  as  sets  of  pairs.   However,  and  most  importantly, 
functional  retrieval  and  storage  .operations  can  be  applied 
to  them.   If  F  is  such  a  map,  then  F(X)  yields  the  second 
element  of  the  unique  pair  in  F,  whose  first  element  is  X. 


Maps  need  not  be  single-valued.   A  map  may  contain 
several  pairs  which  have  the  same  first  element,  in  which 
case  the  map  is  called  a  multi-valued  map.   If  F  contains 
both  tX.Yl  and  [X.Zl,  then  the  expression  F(X)  is 
undefined.   However,  the  set  of  all  values  into  which  a 
map  sends  a  given  element  X  of  its  domain  can  be  retrieved 
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by  using  the  expression  F{X}.   In  this  case,  F{X}  yields 
the  set  {Y,Z}.   If  F  is  single-valued  at  X,  F{X}  yields  a 
singleton  set. 

An  additional  range  retrieval  operation  is  provided 
for  maps.   If  F  is  a  map  and  S  is  a  set  then  the 
expression  FlS)  is  the  set  of  images  of  elements  of  S 
under  F,  i.e.  denotes  the  set 

{  Y  :  X€S,Y€F{X}  } 

Because  of  the  essential  role  they  play  in  SETL,  sets  and 
maps  are  the  central  objects  studied  in  this  paper. 

The  undefined  atom  OM  is  a  particular  constant  related 
to  various  SETL  operations.   OM  is  not  allowed  to  be  a 
member  of  any  set  but  can  be  a  component  of  a  tuple.   It 
is  invalid  in  most  contexts  within  expressions  but  can 
appear  in  an  equality  test.   It  is  the  valid  result  of 
several  operations  on  sets  and  tuples.   In  particular,  (a) 
if  F  is  a  map  then  the  expression  F(X)  yields  OM  if  F  has 
not  been  defined  or  is  multiply  defined  on  X,  (b)  it  is 
the  value  of  an  iterator  variable  at  the  end  of  an 
iteration,  and  (c)  it  is  obtained  when  extracting  an 
arbitrary  element  from  the  empty  set. 

The  control  structure  of  SETL  is  largely  conventional 
and  tends  to  follow  ALGOL  60.   Conditional  statements  and 
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expressions  are  provided,  as  are  if-then-else  clause, 
while-loops,  cases,  subprocedures  and  functions.   'quit' 
and  'continue'  statements  are  included  for  abnormal  loop 
control  ;  'quit'  causes  control  to  leave  the  loop  and 
'continue*  returns  control  to  the  beginning  of  the  loop. 
A  quit  or  continue  statement  is  applied  to  the  innermost 
loop  with  opening  tokens  matching  the  tokens  following  the 
keyword  'quit'  or  'continue'. 

The  least  familar  SETL  control  form  is  the 
iterator-over-set,  which  is  written  as 

(  yxi€El , .  . . .Xn€En  I  C ( X  1  ,  .  .  .  , Xn ) )  block  ;  end  V  ; 

Where  El En  are  expressions  with  set  values,  C  is  a 

boolean  expression  of  XI,..., Xn,  and  'block'  is  any 
sequence  of  statements.   This  iterator  expression  executes 
'block'  repeatedly,  once  for  each  group  X 1 , . . . , Xn  of 
variable  values  belonging  to  El,..., En  respectively  and 
satisfying  the  boolean  condition  C ( X  1  ,  .  .  . , Xn) .   Convenient 
syntax  to  iterate  over  a  map  is  also  provided  '•    in  the 
iterator  'VY:=F(X)',  X  varies  over  the  domain  of  the  map 
F,  and  'i    receives  the  corresponding  range  element. 

Functions  and  subroutines  can  be  recursive. 
Parameters  are  passed  by  value  without  return,  i.e.,  the 
values  of  the  actual  arguments  passed  to  a  function  or 
subroutine  are  not  changed  in  the  calling  routine.   No 
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value  can  be  returned  from  a  subroutine.   The  value  to  be 
returned  from  a  function  must  be  specified  in  a  return 
instruction. 

Format  is  free,  and  statements  are  punctuated  by 
semicolons.   The  PL/1  comment  convention  is  employed.   The 
abbreviated  statement  'X  op  Y  ;'  stands  for  'X  = =  X  op  Y  ;' 
A  special  primitive  'from',  such  that  statement  'X  from  Y  ;' 
is  synonymous  with  'Y  :=  arb  S  ;   S  less  Y  ;',  is  also 
provided.   A  front-end  macroprocessor  is  included  as 
a  convience  in  the  SETL  compiler.   Macro  definitions  have 
the  form 

macro  NAME  (NAMLISTI  ;  NAnLIST2  )   text   end  NAME  ; 

Where  NAMELIST1  contains  the  macro's  arguments,  and 
NAMELIST2  is  a  list  of  names  for  uhich  neu  identifiers  are 
generated  for  corresponding  names  in  the  macro  body.   Both 
namelists  are  optional. 

A  SETL  program  consists  of  a  set  of  seperately 
compiled  'modules',  each  of  uhich  contains  a  set  of 
functions  and  subprocedures .   Variables  are  local  by 
default  but  may  be  declared  global  to  a  module.   Global 
variables  may  be  made  'public',  allouing  them  to  be 
included  in  other  modules.   A  user  can  determine  that 
certain  variables  are  stored  statically  uhile  others  are 
stacked  on  entry  to  a  routine. 
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Every  program  must  contian  a  module  called  MAIN.   Thi: 
module  should  contain  a  block  of  instructions  which  forms 
the  main  program.   A  simple  form  of  initialisation  block 
is  also  implemented.   Only  static  variables  can  be 
initialized  ;  this  intialisation  is  performed  before  the 
start  of  execution. 


1  .  H  Definitions 

A  number  of  terms  which  facilitate  later  discussion 
are  defined  in  this  section. 

Occurrences  : 

An  occurrence  is  a  use  or  definition  of  a  program 
variable . 

Ovariables  • 

An  ©variable  is  an  occurrence  at  which  the  variable  is 
assigned  a  new  value. 

Ivariables  : 

An  ivariable  is  an  occurrence  at  which  the  value  of  a 
variable  is  retrieved.   For  example,  in  the  instruction 
'X:=X+1;',  the  X  in  the  left  hand  side  is  an  ©variable 
while  the  X  in  the  right  hand  side  is  an  ivariable. 

FFROn  map  •• 
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FFROM  is  one  of  our  main  data  flow  maps.   FFROn{OI} 
maps  an  occurrence  01  of  a  variable  V  to  the  set  of 
occurrences  of  V  uhich  can  be  reached  from  01  throurh  a 
V-clear  path,  where  a  V-clear  path  means  a  path  on  uhich 
there  are  no  occurrences  of  V.   Note  that  the  transitive 
closure  of  FFROM  gives  the  traditional  'definition-use 
chain ' . 

BFROM  map  : 

The  map  BFROM  is  the  inverse  of  FFROM. 

CRTHIS  map  : 

This  is  a  value  flou  mapping.   CRTHIS{OI}  maps  a 
variable  occurrence  01  to  its  creation  points,  i.e.,  the 
set  of  all  ovariables  uhose  evaluation  can  create  an 
object  uhich  at  some  moment  in  the  execution  of  the 
program  becomes  the  current  value  of  01.   For  example, 

LI  :  Y  :=  X  +  1  ; 

L2  :  S  with  Y  ; 

L3  :  Z  from  S  ; 

m  :  U  :=  Z  ; 

If  we  use  the  notation  Vi  to  denote  the  occurrence  of  the 
variable  V  at  the  instruction  I,  then  some  of  the 
relations  among  the  variable  occurrences  in  these  three 
instructions  will  be 

FFR0M{Y11  =  {Y2},       FFR0M{S2}  =  {S3}, 
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BFR0n{Y2}  =  {Y1}, 
Y1  €  CRTHIS{U4} 


BFR0M{S3}  =  {S2}, 


PS-CRTHIS  map  : 

This  is  the  value  flow  mapping  to  be  used  in  our  data 
structure  choice  algorithm.   PS-CRTHISCOI}  maps  a  variable 
occurrence  01  to  the  pseudo  creation  points  of  01,  i.e., 
the  set  of  variable  occurrences  which  are  the  ovariables 
of  value  creation  or  value  retrieval  instructions  and 
uhose  values  can  be  transmitted  to  01  through  simple 
assignment  instructions.   This  map  is  similar  to  CRTHIS 
map  but  it  does  not  link  the  occurrences  uhose  values  may 
be  transmitted  from  one  to  the  other  through  a  series  of 
value  insertion  into  and  value  extraction  from  composite 
objects.   For  example,  in  the  above  example,  the  pseudo 
creation  point  of  the  U  appearing  at  LM  will  be  the  Z 
appearing  at  L3,  while  its  creation  point  is  the  Y 
appearing  at  LI. 
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CHAPTER  2  :  THE  SETL  BASING  SYSTEM 

In  our  approach  to  an  ADSC  system,  we  have  used  the 
semi-automatic  data  structuring  system  provided  in  the 
presently  implemented  SETL  language  as  a  stepping  stone. 
In  this  SETL  'basing  language',  efficient  data  structures 
for  the  objects  of  a  SETL  program  are  defined  by  supplying 
the  language  processor  with  detailed  declarations  of 
structural  relations  among  these  objects.   We  regard  the 
definition  and  implementation  of  this  'basing  language'  as 
two  essential  preliminary  steps  to  our  ADSC  system.   These 
tuo  steps  determine  the  basic  family  of  data  structures 
which  we  must  consider,  and  give  us  a  systematic  language 
for  characterizing  data  structures.   Two  succeeding  steps 
-  designing  a  data  structure  choice  algorithm  and  the  fact 
collector  on  which  it  rests  then  become  the  major  issues 
to  be  studied  in  designing  our  total  system. 

In  this  chapter,  we  will  describe  the  basing  system, 
clarify  the  fundamental  notions  which  it  embedies,  and 
illustrate  its  application. 


2  .  1  The  Notion  of  'Basing' 

In  the  absence  of  user-supplied  declarations,  the  SETL 
processor  chooses,  for  the  sets  and  maps  appearing  in  a 
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program,  a  default  representation  which  is  reasonably 
efficient  for  most  of  the  primitive  set  operations 
commonly  invoked.   However,  it  is  clear  that  very  large 
gains  in  efficiency  can  be  obtained  if  program  variables 
are  represented  in  a  way  which  depends  on  the  specific 
operations  into  which  they  enter.   To  give  an  obvious 
example,  if  sets  SI  and  S2  are  known  to  be  subsets  of  some 
other  set  B,  then  bit-vector  representations  for  SI  and  S2 
(where  an  on-bit  position  indicates  the  presence  of  a 
given  element  of  B  in  the  corresponding  subset)  can  be 
very  advantageous  if  the  intersection  operation,  S1*S2,  is 
to  be  performed  frequently. 

Generalising  the  fundamental  technique  apparent  here 
we  say  that  an  object  X  is  based  on  another  one  B,  if  X  is 
represented  in  some  special,  abbreviated  form  such  that 
the  presence  of  B  is  required  for  the  full  description  of 
X.   In  this  case  B  is  said  to  be  a  'base'. 

In  the  present  SETL  system,  two  kinds  of  basings  have 
been  introduced. 

(1)   Member  Basings  -  This  scheme  enhances  the  efficiency 
of  SETL  (which  is  a  value  language),  by  using  pointer 
mechanisms  internally.   Whenever  the  value  V  of  a  variable 
X  is  known,  either  by  declaration  or  by  some  decision  made 
by  the  optimizer,  to  be  an  element  of  another  set  B 
(called  the  base),  a  pointer  to  the  element  of  the  same 
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value  in  B,  instead  of  the  value  V  itself,  is  kept  in  X. 
This  technique  allows  us  to  save  substantial  execution 
time  uhenever  internal  'locate'  operations  are  required  in 
the  subsequent  uses  of  the  variable  X.   With  such  a 
pointer  available,  an  indirect  reference  can  replace  a 
more  expensive  series  of  searches  and  equality  tests. 
This  can  be  extremely  significant,  especially  when  the 
value  of  X  is  a  composite  object. 

(2)   Domain  Basings  -  When  a  set  S  is  known  to  be  a  subset 
of  another  set  B  (which  again  we  call  the  base  of  S),  it 
can  be  represented  by  a  collection  of  bits  associated  with 
B,  each  indicating  whether  a  particular  element  of  B  is  in 
is.   These  bits  can  be  stored  either  locally  with  the 
elements  of  B,  or  remotely,  i.e.,  the  whole  collection  of 
bits  can  be  stored  as  a  bit  string  and  each  element  of  B 
can  be  supplied  with  an  index  which  can  be  used  to  address 
all  such  'remote  bit  strings'.   A  common  advantage  of  this 
kind  of  structure  is  that  both  the  'local  bit'  and  the 
'remote  bit'  representations  can  save  substantial  space. 
Moreover,  remote  repres.ejitation  can  speed  up  boolean 
operations  on  sets  very  greatly.   The  same  approach  can 
also  be  extended  to  maps  whose  domains  are  known  to  be 
subsets  of  the  base.   In  this  case,  the  collection  of  bits 
is  replaced  by  a  collection  of  map  value  pointers. 


In  summary,  'basing'  which  introduces  indexing  and 
pointer  notions  at  the  impelementation  level  of  the  SETL 

24 


system  will  play  a  central  role  in  our  approach 


2 . 2  Data  Structure  for  Based  Representations 

In  order  to  make  clear  the  efficiency  gains  obtainable 
in  the  presence  of  basings,  the  concrete  representations 
used  for  based  objects  are  discussed  in  this  section. 


In  the  absence  of  declarations,  the  fundamental 
structure  used  to  represent  a  set  in  SETL  is  a  breathing 
hash-table,  i.e.,  a  hash-table  whose  table  size  is 
adjusted  dynamically  in  order  to  keep  the  length  of 
clash-lists  approximately  constant,  so  as  to  guarantee 
that  the  membership  operation  is  aluays  performed  in  a 
time  which  is  independent  of  the  size  of  the  set  being 
searched.   Since  in  the  standard  SETL  situation,  map 
retrieval,  set  union  and  intersection  all  involve  internal 
membership  tests,  a  significant  part  of  the  execution  cost 
of  an  undeclared  SETL  program  is  roughly  proportional  to 
the  total  number  of  hash-search  operations  performed  (here 
we  disregard  the  overhead  involved  in  reallocating 
hash-tables).   The  efficiency  advantage  secured  by  the  use 
of  based  representation  is  therefore  seen  to  result  from 
the  possibility  of  replacing  these  hash-search  operations 
by  simpler  code  sequences,  typically  involving  only  one  or 
two  indexing  operations,  and  from  the  possibility  of  using 
bit-parallel  operations  in  some  favorable  cases. 
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The  possible  candidates  for  an  extended  library  of  set 
representations  are  numerous.   The  structures  selected  for 
incorporation  into  our  library  reflect  this  potential 
variety,  but  are  compromised  by  the  need  to  keep  our 
library  down  to  a  manageable  size,  and  our  subjective 
judgement  concerning  the  most  important  language 
constructs  to  optimise.   The  most  significant  of  the 
structures  ue  use  are  as  f ollous . 

If  a  map  F  is  known  to  be  based  on  B,  and  X  is  known 
to  be  an  element  of  B,  then  the  value  of  F(X)  may  be 
stored  as  part  of  the  element  block  in  B  which  represents 
X.   To  do  this,  we  allocate,  in  each  base  element  block, 
fields  which  will  hold  the  values  of  some  of  the  maps 
which  are  defined  on  these  elements.   Successive  fields  in 
this  block  correspond  to  various  maps  Fl,F2....Fn  whose 
domains  are  known  to  be  subsets  of  B.   When  this 
representation  is  used,  retrieval  or  assignment  of  F(X) 
becomes  an  indexing  operation  which  uses  an  offset 
associated  with  F  (known  at  compile  time)  to  access  an 
appropriate  field  in  the  element  block  for  X.   The 
representation  used  for  X  must  then  contain  a  pointer  to 
the  element  block  in  B  which  represents  X.   Note,  however, 
that  in  dealing  with  elements  not  known  to  belong  to  B,  we 
must  be  able  to  locate  them  in  B  by  using  a  standard 
hash-search  procedure.   Therefore  in  most  cases  B  must 
have  most  of  the  hash-table  structure  of  standard  unbased 
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sets  . 


The  map  representation  described  above  can  be  said  to 
be  local  :  map  values  are  directly  attached  to  the 
representation  of  elements  of  the  map  domain.   This 
representation  optimizes  retrieval  operations,  but  is 
awkward  for  global  operations  such  as  iteraton  and 
copying,  because  of  the  distributed  nature  of  the 
representation.   An  alternative  representation,  which  is 
equivalent  in  amount  of  storage  used  and  only  slightly 
less  efficient  for  retrievals,  is  available  if  we  store 
the  range  of  a  map  as  a  tuple,  and  incorporate  a  single 
index  integer  in  each  element  of  the  map  domain  (i.e.,  of 
the  base),  using  this  index  to  access,  the  tuple.   The 
index  is  incremented  sequentially  whenever  a  new  value  is 
made  an  element  of  the  base.   Suppose  the  object  X,  XeS, 
has  index  I.   Then  the  value  of  F(X)  is  found  by 
retrieving  TF(I),  where  TF  is  the  tuple  representing  F. 
This  representation,  which  is  a  kind  of  dual  to  the  local 
representation  described  previously,  is  said  to  be 
'remote',  and  F  is  said  to  be  a  'remote  map'  based  on  B. 

Similar  local  and  remote  representations  exist  for 
subsets.   If  a  set  S  is  addressed  only  by  insertions, 
deletions,  and  cardinality  checks,  then  it  can 
advantageously  be  represented  by  individual  bits  attached 
to  the  elements  of  a  base  B.   If  X  is  an  element  of  B, 
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then  the  test  'X  in  S'  is  performed  by  examining  that 
single  bit  in  the  element  block  of  X  which  determines 
membership  in  S.   Since  ue  may  want  several  subsets  to  be 
represented  in  this  fashion,  each  element  block  in  a  base 
is  allowed  to  contain  a  field  whose  i-th  bit  indicates 
membership  of  the  corresponding  element  of  B  in  the  i-th 
subset  based  on  B.   Subsets  represented  in  this  way  are 
said  to  be  local  subsets. 

Local  subsets  support  efficient  insertion,  deletion, 
membership  test  and  cardinality  check  operations,  but 
global  operations  such  as  union  and  intersection  are 
inefficient  uhen  applied  to  local  subsets.   For  such 
operations,  bit-vectors,  which  can  make  use  of  hardware 
bit-parallel  operations,  are  a  more  appropriate  choice. 
For  sets  with  this  representation,  the  index  i  attached  to 
element  X  of  the  base  B  (described  above  in  connection 
with  remote  maps)  is  also  used  to  index  the  representing 
bit-vector  ;  if  bit  i  is  on  in  the  bit-vector 
representation  of  S,  it  indicates  that  XeS.   Membership 
tests  addressing  such  structures  are  slightly  slower  than 
membership  tests  for  local  subsets,  as  one  additional 
indexing  operation  is  involved. 

The  data  structures  described  so  far  are  in  general 
more  compact  than  their  unbased  counterparts.   There  are 
cases,  however,  in  which  they  may  lead  to  very  inefficient 
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Local    set  header 


Set  specifier 


S:set(EB) 


To  temp  la t 

*~  in  base 

Hash 

Size 

V/ord  in 
e-b 

Bit 
position 

Map  specifier 


fn:map(eB)^ 


Local   map  header 


-«D. 


To  templat 

'**~  In  base 

Hash 

Size 

Word  in 
e-b 

Fig.    3.      Representation  for   local    based  objects. 
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Set  specifier 
S:set(eB) 


Map  specifier 


m:map(eB)£B' 


Remote  set 


Bit  string 


Size 


Value 


t 


t 


Remote  map 


tt   words 


To  template 
-t*-  in  base 


J 


Tuple 


Hash  Size 


To 

base-a 


Max. 
index 


Undefined  value 


Base  array 

To  header  of  B 

To  header  of  B' 

— 

Fig,  h.      Representations  for  remote  based  objects. 

The  base  array  contains  links  to  the  bases  of  based  objects, 
used  during  execution-time  mode  conversions. 
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It  is 


storage  uses.   If  a  based  map  (on  B)  is  defined  over  a 
small  subset  SB  of  B  ,  then  *B-#SB  uords  (one  of  uhich  is 
allocated  for  each  element  block  in  B)  will  be  wasted.   In 
this  case,  it  can  be  preferable  to  represent  F  as  a 
standard  hash-table,  retaining  houever  the  'eB'  mode  for 
the  elements  in  its  domain.   This  leads  to  so-called 
sparse  representation.   The  sparse  representations  are 
still  slightly  more  efficient  than  completely  unbased 
ones,  in  terms  of  accessing  :  if  X  is  €B  and  F  is  a  sparse 
map  domain  based  on  B,  then  the  hash-code  for  X  is  stored 
in  its  element-block  in  B,  and  need  not  be  recomputed  to 
access  the  hash-table  for  F  when  retrieving  F(X)   (this  is 
because  the  hash  code  of  a  value  is  defined  in  a 
system-wide  invariant  way).   The  same  technique  can  also 
be  applied  to  subsets.   Sets  represented  in  this  fashion 
are  called  sparse  sets.   The  specific  way  a  based  object 
is  ultimately  represented  (locally,  remotely  or  sparsely) 
will  be  called  the  'representation  attribute'  of  the  based 
object . 

The  considerations  that  we  have  just  described  lead  to 
the  structures  pictured  in  figures  1  to  4.   Further  details  can 
be  found  in  Deuar  et  al[1977bl. 


2  .  3  Bases 


The  data  structures  described  in  the  previous  section 
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derive  their  usefulness  from  the  pointer  and  index 
mechanisms  which  they  implicitly  make  available.   Houever, 
use  of  these  efficiency-  oriented  pointer  mechanisms 
creates  the  potential  for  conflict  with  the  strict  value 
semantics  of  SETL.   For  example,  if  X  has  been  declared 
'€B',  and  F(X)  has  been  given  a  value  once,  subsequent 
deletion  of  X  from  B  might  be  feared  to  have  unanticipated 
side-  effects  =  should  ue  consider  F(X)  to  still  be 
defined  ?   If  a  set  S  is  based  on  B  and  an  element  X  of  B 
uas  in  S,  should  we  say  that  the  domain  basing  of  S  on  B 
is  invalid  after  X  has  been  deleted  from  B  ?  To  permit 
such  side-effects  is  unacceptable  since  it  would  imply 
different  semantics  for  progams,  depending  on  whether 
based  representations  had  or  had  not  been  declared.   To 
avoid  this,  we  insist  that  bases  are  not  program 
variables,  i.e.,  they  are  not  explicitly  created  or 
modified  by  the  user's  code.   The  value  of  a  base  is 
defined  only  by  the  collection  of  all  objects  based  on  it. 
In  a  'declared'  program  (i.e.,  one  in  which  based 
representaions  have  been  declared  for  variables)  bases  are 
built  and  updated  in  response  to  operations  which  create 
objects  declared  to  be  based  on  them.   For  example,  if  the 
declaration  X€B  has  been  supplied,  then  whenever  X 
receives  a  value  which  can  not  be  determined  a  priori  to 
be  already  in  B,  a  hash-search  in  B  is  performed,  and  if 
the  value  of  X  is  not  already  there,  it  is  inserted  into 
B.   This  implies  that  in  the  absence  of  some  compaction 
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mechanism,  bases  can  only  grou  monotonically  during 
program  execution.   It  also  means  that  much  of  the  cost  of 
using  based  representations  is  incurred  in  the  build-up  of 
bases . 

Note  that  uhen  a  user  introduces  basing  declarations 
in  his  program,  he  uill  define  bases  using  names  which  are 
distinct  from  any  identifiers  already  present  in  his 
program  ;  but  these  bases  may  actually  turn  out  to  be 
identical  in  value  to  some  actual  program  variables.   For 
example,  the  variable  S  may  be  declared  to  be  domain  based 
on  B,  but  it  might  be  possible  to  ascertain  that  S=B, 
because  B  receives  elements  only  uhen  elements  are  added 
to  S  and  no  element  is  ever  deleted  from  S.   A  SETL 
compiler  able  to  recognise  this  might  choose  to  treat  S 
itself  as  a  base.   This  illustrates  a  very  general 
principle  of  our  basing  system  =  the  processor  will  use 
pointer  mechanisms  as  far  as  it  safely  can  but  the  SETL 
value  semantics  visible  at  the  user  level  uill  be 
preserved  faithfully. 


2  .  4  Details  of  the  Basing  Language 

We  shall  now  describe  the  syntax  and  semantics  of 
basing  declarations  in  more  detail. 


The  generic  term  'mode'  is  used  in  our  system  to  refer 
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to  the  total  information  defining  the  data  representation 
of  a  program  variable.   For  each  of  the  modes  which  ue 
admit,  a  symbolic  notation  is  introduced.   The  family  of 
notations  which  thereby  arise  constitutes  our  data 
structure  representation  language,  or  basing  language. 
Among  these  modes,  the  modes  related  to  bases  play  the 
most  important  roles,  as  explained  above. 

Each  variable  in  a  SETL  program  can  be  declared  to 
have  a  mode.   A  mode  descriptor  specifies  the  SETL  type  of 
a  variable,  and  in  addition,  gives  complete  or  partial 
structural  information  about  it,  e.g.,  its  size  and  its 
relationship  to  other  variables.   Once  a  variable  is 
declared,  its  mode  stays  static,  i.e.,  can  never  be 
changed.   The  modes  of  undeclared  variables  are  determined 
automatically  by  the  language  processor,  but  may  be  left 
'general'.   A  related  set  of  rules  also  determine  the  mode 
of  compiler-generated  temporaries. 

In  general,  mode  descriptors  can  be  classified  into 
four  categories  =  1)  primitive  modes,  2)  bases,  3)  derived 
modes  and  M)  composite  modes. 

2.4.1  Primitive  Modes 


Primitive  modes  correspond  to  the  primitive  types  of 

SETL  :  int,  chars,  bits,  real  and  atom.   An  optional  range 

descriptor  may  be  appended  to  the  first  three  to  indicate 
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minimum  and  maximum  size  of  the  corresponding  object.   For 
example  : 

repr  X,y,Z  :  int(0,1000)  ; 

CI  .C2  :  chars(  100)  ; 

R  :  real  ; 
end  ; 

As  this  illustrates,  the  range  descriptor  part  of  a 
primitive  mode  has  the  form  (n1,n2),  where  n1  and  n2  are 
integers.   The  range  specifier  (0,n2)  can  be  abbreviated 
(n2)  . 

2.4.2  Bases 

The  declaration  'base(M)'  describes  a  base  whose 
elements  have  mode  M.    Bases  must  be  declared  before 
appearing  within  other  mode  descriptors.   To  give  a  mode 
descriptor  for  base  elements  is  optional. 

2.4.3  Derived  Modes 

Derived  modes  indicate  relationships  to  a  specific 
base.   The  only  mode  descriptor  belonging  to  this  category 
is  the  member  basing  mode  (e.g.,  €B),   which  specifies 
that  the  value  of  a  program  variable  is  an  element  of  a 
base  B. 

2.4.4  Composite  Modes 
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Mode  descriptors  for  sets,  tuples  and  maps  are 
constructed  recursively  from  other  modes,  using  th » 
following  construction  : 

n  set(M)(SZ) 

describes  a  set  whose  elements  have  mode  M,  and 
whose  expected  size  is  SZ.   Here,  as  elsewhere, 
the  size  parameter  is  optional.   If  M  is  a  mode  of 
the  form  €B,  where  B  is  a  base,  the  attribute 
keyword  'local'  'remote'  or  'sparse'  can  be  used 
before  the  keyword  'set'.   The  altenative  notation 
{n},  which  is  equivalent  to  set(M),  is  allowed. 

2D  tuple (M) ( SZ )  describes  a  tuple  whose  components  have 
mode  M  and  whose  expected  size  is  SZ. 


3)  tupleCMI ,n2, . . , ) 

describes  a  tuple  of  known  length  whose  components 
have  the  specified  modes.  Ml,  M2,...,etc.   The 
alternative  notation  [M1,M2,...],  which  is 
equivalent  to  tuple ( M 1 , M2 ,...) ,  is  also  allowed. 

4)  map(M1)M2 

is  the  mode  descriptor  for  a  map  from  objects  of 
mode  Ml  to  objects  of  mode  M2,  i.e. ,  the  domain 
and  the  range  of  the  map  have  the  mode  (Mil  and 
{M2}  respectively.   If  Ml  is  a  mode  of  the  form 
€B,  the  keywords  'local',  'remote'  and  'sparse* 
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can  be  used  in  the  same  uay  as  that  of  a  domain 
based  set. 


5)  smap(M1)M2 


is  the  mode  descriptor  for  a  single-valued  map. 
Nl  and  M2  have  the  same  significance  as  for  'map'. 

6)  mmap{nilI12 

is  the  mode  descriptor  for  a  multi-valued  map,  or 
relation,  whose  domain  is  of  mode  {Ml}.   Since  the 
set  of  images  of  any  point  X  of  its  domain  has 
mode  M2,  M2  must  be  a  designator  for  a  set  mode. 

7)  The  three  previous  mode  designators  are  extended  to 

multi-variate  maps.   For  example,  we  can  write 

repr  F  :  map ( eB , eB ) int  ; 

G  :  mmapC eB 1 , eB2 , eB3)tuple(chars )  ; 
end  ; 


For  variables  the  mode  of  whose  value  may  vary  during 
program  execution,  the  mode  'general'  can  be  specified. 
Whenever  an  object  X  of  general  mode  is  assigned  to  a 
declared  Y,  the  mode  of  X  will  be  checked  dynamically,  and 
its  conformity  with  the  mode  of  Y  will  be  established.   If 
the  modes  are  not  compatible,  the  program  terminates 
immediately.   In  the  reverse  case,  when  Y  is  assigned  to 
X,  X  will  inherit  the  mode  of  Y  and  no  mode  conversion  is 
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required . 

It  is  somtimes  convenient  to  bypass  the  basing 
mechanism,  and  to  specify  that  the  representation  of  a 
given  object  is  to  be  disjoint  from  that  of  any  other, 
i.e.,  'unbased'.   If  the  mode  'unbased'  is  specified  for 
variable  X,  it  means  that  if  in  the  course  of  program 
execution  X  receives  a  value  whose  representation  was 
based  then  this  value  uill  be  reconverted  to  an  unbased 
form  before  being  assigned  to  X.   Note  that  ue  also  allou 
variables  to  be  undeclared  (rather  than  unbased).   The 
processor  supplies  a  basing  for  undeclared  variables, 
consistent  uith  the  ways  in  which  such  variables  are  used 
and  assigned. 

Program  variables  may  be  declared  to  have  multiple 
modes.   Tliis  useful  extension  of  the  data  representation 
language  allows  the  user  to  create  multiple 
representations  of  the  same  object.   For  example, 

repr  S  =  set(eBl),  eB2  ;   end  repr  ; 

allows  us  to  describe  S  as  a  domain  based  subset  of  Bl, 
which  is  also  to  be  considered  as  an  element  of  the  second 
base  B2.   If  this  declaration  is  used  then  the  maps  based 
on  31  can  be  accessed  via  elements  of  S,  while  the  maps 
defined  on  S  can  be  based  on  B2 .   Another  example  is 

repr  X  :  eBI,  €B2,  eB3  ;   end  repr  ; 
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which  specifies  that  the  object  X  is  to  have  three 
simultaneous  representations,  as  elements  of  each  of  the 
bases  B 1  ,B2  and  B3. 

A  user  can  introduce  new  mode  names  with  the 
declarative  statement  : 

mode  MODENAME  :  mode-descriptor  ; 

The  MODENAME  can  then  be  used  subsequently  in  the  program 
as  a  valid  mode  descriptor.   The  only  restriction  is  that 
a  mode  name  can  be  defined  at  most  once  and  should  be 
different  from  language  reserved  words  and  program 
variables . 


2 . 5  A  Case  Study  of  the  Application  of  Basings 


As  an  example  illustrating  the  use  of  basings,  ue 
present  a  version  of  the  interval  analysis  procedure 
introduced  by  Cocke  and  Allen  ;  cf.  Allenl1972], 
Schaef er ( 1 973 1 .   The  original  'unbased'  version  of  this 
program  is  taken  from  Schwartz [  1 973  ]  ,  pp.  221-223.   To 
ease  understandng ,  we  shall  first  describe  the  logic  of 
this  procedure  informally.   Then  we  give  basing 
declarations  as  well  as  program  code,  and  analyse  the 
advantages  obtained  by  the  use  of  basings. 
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2.5.1  Procfram  Logic 

We  assume  a  directed  program  graph  G  to  be  given  as  a 
set  NODES  and  a  function  CESOR  which  maps  each  NDeNODES 
into  the  set  CESOR(ND)  of  all  its  successors.   One 
particular  member  of  nodes  is  assumed  to  be  distinguished 
as  the  entry  node  of  G. 

An  interval  in  G  is  a  set  S  of  nodes,  containing  a 
distinguished  node  X  called  the  head  of  S,  such  that  there 
is  no  entry  into  S  except  through  X,  and  other  nodes  in  S 
can  be  reached  from  X  along  a  path  wholly  contained  in  S, 
and  such  that  when  X  is  removed,  S  is  free  of  loops  (i.e. 
of  closed  paths).   It  is  a  characteristic  property  of 
intervals  that  their  nodes  can  be  enumerated  in  such  an 
order  that,  with  the  exception  of  branches  terminating  at 
the  interval  head,  all  branches  between  nodes  of  the 
interval  are  'forward'  branches,  i.e.,  go  from  a  node  S  to 
a  node  Y  having  a  larger  serial  number  in  the  enumeration 
of  the  interval.   Such  an  enumeration  of  nodes  is  said  to 
be  an  enumeration  in  interval  order.   The  interval  of  a 
node  X  is  the  largest  interval  with  X  as  head  ;  it  may 
consist  of  X  only.   The  procedure  INTERVAL  shown  below 
determines  the  interval  of  a  node  X,  and  enumerates  the 
nodes  of  this  interval  in  interval  order. 

An  interval  is  called  maximal  if  it  is  not  contained 
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in  any  larger  interval.   It  can  be  shown  that  every 
program  graph  can  be  decomposed  uniquely  into  a  union  of 
maximal  intervals,  and  that  distinct  maximal  intervals  are 
disjoint.   To  find  these  maximal  intervals,  we  proceed  as 
follows.   Take  the  interval  generated  by  the  entry  node  of 
the  program  graph  ;  this  is  a  first  maximal  interval. 
Then  take  any  point  X  which  is  a  successor  of  some  point  Y 
belonging  to  an  interval  already  formed,  but  which  does 
not  itself  belong  to  an  interval  already  formed.   Form  the 
interval  of  X  ;  this  is  a  new  maximal  interval.   The 
routine  INTERVALS  given  below  realises  this  process.   It 
also  associates,  with  each  maximal  interval  INT,  the  set 
FOLLOW(INT)  of  all  nodes  which  are  successors  of  a  node  of 
INT  but  do  not  belong  to  INT  ;  and  associates  with  each 
node  B  of  G  the  maximal  interval  INTOV(B)  which  contains 
it. 

The  derived  graph  G*  of  a  program  graph  is  defined  as 
follows  :  the  nodes  of  G'  are  the  intervals  of  G  ;  the 
successors  of  an  interval  INT  are  the  intervals  distinct 
from  it  which  contains  successors  of  the  nodes  within  INT 
;  the  entry  node  of  G'  is  the  interval  containing  the 
entry  node  of  G.   The  derived  graph  of  G  is  built  up  by 
the  routine  DG  given  below. 

A  program  graph  in  which  there  exists  no  interval 
containing  more  than  one  node  is  called  an  irreducible 
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graph  ;  fortunately,  such  graphs  arise  only  rarely  in 
connection  with  actual  programs.   In  SETL,  ue  may  urite 
the  condition  of  irreducibility  very  simply  as 

(ftNODES)  =  SINTERVALS(NODES) . 

By  forming  successive  derived  graphs  G '  , ( G '  )  ' , e tc  .  of  an 
original  graph  G,  ue  obtain  the  derivation  sequence  of  G. 
In  cases  in  uhich  this  sequence  converges  to  a  graph 
consisting  only  of  a  single  node,  the  interval-analysis 
method  uhich  has  just  been  outlined  gives  a  decisive 
account  of  program  flow.   In  particular,  it  determines  the 
order  in  uhich  many  other  optimization-related  processes 
should  be  applied  to  G  and  to  the  program  P  of  uhich  G  is 
the  flow  graph.   The  derived  sequence  of  G  is  built  up  by 
the  main  routine  given  belou. 


2.5.2  The  Interval  Analysis  Code 

Nou  ue  present  detailed  code  and  basing  declarations 
for  the  algorithm  described  above. 


module 


INT-ANALYSIS  ; 


$  First  ue  declare  all  global  variables. 


$  ALLNODES  is  a  base  on  uhch  other  variables  are  based. 
$  CESOR  maps  each  node  and  interval  into  the  set  of  its 
$  successor  nodes  or  intervals. 
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$  INTOV  maps  each  node  into  the  interval  containing  it. 

$  NODES  is  the  set  of  nodes  in  current  graph. 

$  FOLLOWERS  is  the  set  of  all  nodes  uhich  follow  some  node 

$  of  an  interval  I  being  constructed,  but  have  not  yet  been 

$  added  to  I. 

$  INTS  is  the  set  of  intervals  in  current  graph. 

$  FOLLOW  maps  each  interval  I  into  the  nodes  uhich  follow 

$  a  node  of  I  but  do  not  belong  to  it. 

$  ENTRYIHT  is  the  entry  node  of  a  derived  graph. 

vars 

ALLNODES,  CESOR,  INTOV, 

NODES,  FOLLOWERS,  INTS,  FOLLOW,  ENTRYINT 
end  vars  ; 

$  We  then  declare  based  representations  of  these  variables. 
repr 

ALLNODES  :  base   ; 

NODES,  INTS  :  spar se { e ALLNODES }  ; 

CESOR,  FOLLOW  :  local  smap ( e ALLNODES ) sparse {€ ALLNODES}  ; 

INTOV  :  local  smap(eALLNODES)eALLNODES  ; 

FOLLOWERS  :  spar se { e ALLNODES }  ; 

ENTRYINT  :  €ALLNODES  ; 

ENTRY  :  CALLNODES  ; 

SE2  :  tuple(tuple(sparse{€ALLNODES} , GALLNODES) )  ; 
end  repr  ; 

$  This  is  the  main  routine  which  constructs  the  entire 
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$  derived  sequence  of  a  graph. 

LO 1 :   read  CESOR  ; 

L02:   read  ENTRY  ; 

L03:    NODES  : =  {N1:  N2 := CESOR ( N 1) }  +  {N2:  N2 := CESOR ( N 1 ) }   ; 

LOH:   SEe  :=  IINODES,  ENTRY]]  ; 

L05:    (  while  #DG(ENTRY)  <  SNODE  ) 

L06:       SE2  11  [IINTS,  ENTRYINT  ]  ]  ; 

L07:       I  NODES,  ENTRY  ]  :=  [  INTS,  ENTRYINT  ]  ; 

end  while  ; 

print  SEB  ; 

proc      DG(ENTRY)  ; 

$  This  routine  constructs  the  derived  graph  of  a  graph. 

repr 

ENTRY,  I  :  eALLNODES  ; 
end  repr  ; 

L08:   INTS  :=  INTERVALS ( ENTRY )  ; 

L09:    ENTRYINT  :=  INTOV(ENTRY)   ; 

L10:    (  V  IGINTS  ) 

L11:       CESOR(I)  :=  INTO V  I  FOLLOW ( I )  ]  ; 

end  Vl  ; 

return  INTS  ; 
end  proc  DG  ; 

proc     INTERVALS(ENTRY)  ; 
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$  This  routine  constructs  all  intervals  of  a  graph. 
$  SEEN  is  the  set  of  all  nodes  which  are  successors  of  some 
$  node  in  an  interval  already  constructed  but  which  have 
$  not  themselves  been  added  into  any  interval. 

repr 

ENTRY,  NODE,  ND  :  eALLNODES  ; 

SEEN,  HEADS  :  remote { GALLNODES }  ; 

J  :  tuple( eALLNODES)  ,  eALLNODES  ; 
end  repr  ; 

L12:   INTS  :=  nl  ;    FOLLOW  :=  nl  ;    INTOV  ==  nl  ; 

L13:    SEEK  :=  {ENTRY}   ;     HEADS  :=  {ENTRY}   ; 

L14:   (  while  SEEN  /=  nl  ) 

L15:       NODE  from  SEEN  ; 

L16:       HEADS  with  NODE  ; 

L17:       INTS  with  (  J  :=  INTERVAL ( NODE )  )  ; 

L18:       FOLLOW(J)  :=  FOLLOWERS  ; 

L19:       (VND  :=  J(N))   INTOV(ND)  :=  J  ;    end  VND  ; 

L20:       SEEN  :=  SEEN  +  (  FOLLOWERS  -  HEADS)  ; 

end  while  ; 

return  INTS  ; 
end  proc  INTERVAL  ; 

proc     INTERVAL(NODE)  ; 

$  This  routine  constructs  the  interval  with  NODE  as 
$  the  head  node . 
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repr 

KODE,  X,  Y,  U  :  CflLLNODES  ; 

NPREDS,  COUNT  :  local  smap ( e ALLNODES ) int  ; 

NEWIN  :  sparse {CALLNODES}  ; 

Z  :  GALLNODES  ; 

INT  :  tuple(eALLNODES)  ; 
end  repr  ; 

$  Count  the  number  of  predecessors  of  every  node. 

L21:    (VXGNODES)  NPREDS(X)  :=  0;     COUNT(X)  :=  0;   end  VX  ; 
L22:   (VXeNODES,YeCESOR(X) )  NPREDS ( Y )= =NPREDS ( Y )+ 1  ;  end  VX  ; 

$  Initialize  the  interval  under  construction  to  be  null,  and 
$  set  FOLLOWERS  to  be  {NODE}. 

L2  3:   INT  :=  nult  ; 

L24:   FOLLOWERS  :=  {NODE}  ; 

$  Set  COUNT(NODE)  equal  to  the  number  of  predecessors  of  NODE 

L25:   COUNT(NODE)  :=  NPREDS(NODE)  ; 

L26:   (uhile  NEWIN: = {YgfOLLOWERS  1  NPREDS ( Y ) =COUNT ( Y ) }  /=  nl ) 
L27:       (VZCNEWIN) 
L28:  INT  II  (Z 1  ; 

L29:  FOLLOWERS  less  Z  ; 

L30:  (VU€CESOR(Z)   I  U  /=  NODE  ) 

L31:  COUNT(U)  ==  COUNT(U)  +  1  ; 

L32:  FOLLOWERS  with  U  ; 

end  VU  ; 
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end  VZ  ; 
end  uhile  ; 
return  INT  ; 
end  proc  INTERVAL  ; 

end  module  INT-ANALYSIS  ; 


2.5.3  Efficiency  Analysis 

We  nou  analyze  the  effect  of  the  basing  choice 
described  above  on  the  execution  efficiency  of  this 
program . 

L0  1  :   Since  CESOR  is  a  map  based  on  ALLNODES  with  images 
also  based  on  ALLNODES,  each  component  of  the 
elements  of  CESOR  (these  elements  are  all  pairs) 
has  to  be  inserted  into  ALLNODES  if  it  has  not  been 
in  ALLNODES  yet.   This  is  the  typical  overhead 
incurred  when  ue  read  a  based  object. 

L02  :   A  base  pointer  has  to  be  derived  for  ENTRY  after 
its  value  is  read  in. 

L09  :   No  hash-search  operation  is  required  in  performing 
this  assignment,  because  INTOV(I)  is  based  on  the 
same  base  as  ENTRY. 

L10,L11  :   The  efficiency  of  this  loop  is  considerably 
improved  by  the  basing  declarations.   Since  I  is 
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based  on  ALLNODES,  no  hash-search  operation  is 
required  to  reach  FOLLOW(I)  and  CESOR(I). 
Similarly,  no  hash-search  is  required  to  compute 
INTOV[FOLLOW{I)  I  . 

L12  :   Since  FOLLOW  and  INTOV  are  based  maps,  it  uill  take 
slightly  longer  to  initialize  their  images  than  to 
initialize  a  standard  null  set. 

L 1 8 , L 1 9 , L2 1 , L22 , L2 3  :   No  hash-search  or  conversion 

operations  are  required.    Note  that  except  for  the 
iteration  over  J  in  L19,  uhich  utlises  the  basing 
' tuple(GALLNODES) ' ,  other  occurrences  of  J  use  its 
member  basing. 

L26  :   NEWIN  is  created  slightly  more  efficiently  than  if 
it  were  unbased.   This  is  because  that  the  hash 
code  of  Y  is  available  uhen  it  is  retrieved  from 
FOLLOWERS  ;  no  hashing  is  required  uhen  Y  is 
inserted  into  NEWIN. 

L29,L32  :   No  hashing  of  Z  (or  U)  is  necessary  ;  but 

search  for  Z  (or  U)  in  the  hash  table  representing 
FOLLOWERS  is  still  required. 


L30  :   The  inequality  test  can  be  done  by  comparing  the 

basing  pointers  of  U  and  NODE  since  both  are  based 
on  ALLNODES  ;  considerable  advantage  is  achieved  by 
this,  especially  uhen  processing  derived  graphs  for 
uhich  each  node  is  represented  by  a  composite 
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object . 
L31  :   No  hash-search  is  required. 

In  summary,  by  basing  the  variables  in  this  program  ue 
have  eliminated  most  of  the  hash-search  operations  which 
are  otherwise  required  at  run  time.   The  cost  ue  pay  for 
this  is  that  ue  have  to  insert,  into  the  base  ALLNODES, 
each  interval  J  created  at  L17  and  also  each  component  of 
the  elements  of  CESOR  read  in  from  the  input.   Since  the 
cost  of  inserting  an  element  into  a  base  is  roughly  the 
same  as  the  cost  of  a  hash-search  opearation,  we  can 
estimate  the  efficiency  gain  of  our  basing  choice  by 
comparing  the  frequency  of  the  insertion  operations  we 
introduced  versus  the  frequency  of  the  hash-search 
operations  saved.   Clearly,  the  former  operations  have 
much  lower  total  frequency  than  the  latter  opeations,  and 
therefore  our  basing  choice  is  indeed  advantageous. 


2.5.4  Comments  on  the  Forecroing  Example 


The  example  presented  above  shows  that  by  properly 
selecting  the  basing  of  each  program  variable,  consistent 
basing  relations  can  be  defined,  eliminating  most  of  the 
hash-search  operations  tliat  would  otherwise  be  required, 
especially  in  nested  loops.   Note  that  we  use  the  term 
'consistency  of  basings*  to  indicate  that  no  or  few 
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conversions  are  implied  by  value  assignments.   Achieving 
this  property  is  one  of  the  major  issues  in  devising 
basings .   When  suitably  high  basing  consistency  is 
achieved,  the  efficiency  of  SETL  programs  uill  be  improved 
considerably . 

The  selection  of  proijer  basing  modes  for  program 
variables  is  not  as  simple  as  one  might  think.   Basing 
choices  cannot  be  made  simply  by  determining 
inclusion/membership  relations  among  variables.   It  uill 
not  always  be  appropriate  to  base  a  subset  T  of  a  set  S  on 
S.   As  illustration  of  this,  note  that  in  the  preceding 
eKmple,  FOLLOWERS  is  a  subset  of  NODES  but  not  based  on 
NODES.   Attaining  consistency  of  basings  among  program 
variables  is  the  most  important  issue  involved  in  basing 
choice.   We  hope  to  do  this  automatically  in  many  cases, 
but  it  appears  that  in  some  cases  consistency  can  only  be 
achieved  through  an  understanding  of  program  logic.   In 
such  cases,  some  (hopefully  quite  small)  degree  of  program 
reconstruction  will  be  required. 
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CHAPTER  3  :  AN  AUTOMATIC  DATA  STRUCTURE  CHOICE  SYSTEM 


The  basing  declaration  language  provides  a  flexible 
tool  which  can  alleviate  much  of  the  uork  needed  to 
realize  an  algorithm  efficiently.   However,  a  substantial 
effort  is  still  required  to  choose  good  basings.   Our  long 
range  goal  is  to  remove  this  burden  from  users  of  the  SETL 
system,  i.e.,  to  generate  good  basings  without  user 
intervention.   Me  have  attacked  this  goal  empirically  by 
taking  a  representative  variety  of  algorithms  and  making 
the  basing  choices  for  them  manually.   By  noting  the  facts 
which  repeatedly  appear  relevant  in  this  process,  we  have 
taken  a  first  essential  step  toward  mechanizing  data 
structure  choice.   The  automatic  system  which  we  have 
constructed  can  therefore  be  regarded  as  an  embodiment  of 
various  heuristic  principles  which  grew  out  of  our 
systematized  experience  of  manual  basing  choice. 

3 . 1  Essential  Observations 

Based  representations  provide  a  systematic  mechanism 
for  optimizing  set-theoretic  operations.   The  gain  they 
can  provide  is  twofold. 

A)  If  basings  are  properly  chosen,  operations  on  based 
sets  and  maps  can  be  performed  without  hash-table 
searches.   The  hashing  and  clash-list  scanning  otherwise 
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required  can  be  replaced  by  indexing  operations. 

B)  If  good  basing  choices  succeed  in  simplifying  SETL 
operations  sufficiently,  the  code  for  the  remaining  SETL 
primitives  can  be  emitted  on-line,  eliminating  the 
interpretive  overhead  imposed  by  the  calls  to  off-line 
hash-table  accessing  procedures. 

The  main  costs  involved  in  using  based  representations 
are  also  twofold  : 

A)  When  based  objects  are  generated,  their  bases  are  built 
simultaneously  behind  the  scenes.   Inserting  a  neu  element 
into  a  set  forces  its  parallel  insertion  into  the 
corresponding  base,  an  operation  slightly  more  expensive 
than  normal  (unbased)  set  insertion,  because  an  element 
block,  generally  several  words  long,  must  be  allocated. 

B)  Bases  are  bulky  =  each  element  block  must  accomodate 
the  value  of  all  based  functions  that  are  defined  on  some 
subsets  of  the  base.   If  the  domains  of  these  functions 
cover  only  a  small  portion  of  the  base,  the  wasted  space 
can  be  considerable. 


Leaving  aside  for  now  the  question  of  storage 
optimization,  it  is  important  to  notice  that  both  cost  and 
gains  connected  with  the  use  of  basings  can  be  quantified 
in  terms  of  the  number  of  insertion  operations  and 
hash-searches  performed.   The  generic  term  'locate'  is 
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henceforth  used  to  denote  this  class  of  operations,  i.e., 
insertion  operations  and  hash-searches.   Mote  that  the 
cost  of  set  insertion  is  itself  the  combination  of  a 
hash-search  and  a  storage  request,  and  little  will  be  lost 
if  ue  disregard  the  latter.   This  means  that  we  will 
disregard  the  difference  between  an  unbased  set  insertion, 
and  a  base  insertion.   If  we  use  this  somewhat  simplified 
measure,  the  major  goal  of  basing  selection  can  be 
characterised  as  that  of  reducing  the  number  of  locate 
operations  required  during  program  execution. 


3  .  2  Fundamental  Idea 


A  locate  operation  can  be  avoided  at  a  set  or  map 
operation  if  the  arguments  of  the  operation  are  properly 
based,  i.e.,  if  locate  operations  have  been  executed  at 
certain  points  before  the  current  instruction  is  reached. 
Thus  the  aim  of  basing  choice  may  be  defined  as  that  of 
moving  locate  operations  from  the  points  at  which  an 
object  is  used  to  certain  program  points  which  follow  the 
point  at  which  the  value  of  the  object  is  created  but 
precede  its  points  of  use.   Such  movement  can  reduce  the 
number  of  locates  required  during  execution,  and  thus 
increase  execution  efficiency,  if  the  points  at  which 
locate  operations  are  executed  have  lower  frequencies  than 
the  instruction  at  which  the  object  is  subject  to  a  set  or 
map  operation.   In  manual  use  of  the  basing  system,  this 
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kind  of  movement  of  implicit  locate  operations  is  achieved 
by  imposing  proper  basings  on  variable  occurrences.   Our 
basing  selection  must  thus  aim  to  uncover  basings  uhich 
have  this  effect. 

With  this  idea  in  mind,  let  us  consider  the  following 
example . 

LI  :  read  NODES  ; 

L2:  read  FATHER  ; 

L3:  ROOT  -•  =  nl  ; 

LU:  (V  X€NODES  ) 

L5:         Y  .-=  X  ; 

L6:         (  while  FATHER(Y)  /=  Y  ) 

L7:  Y  -•=  FATHER(Y)  ;    end; 

L8:  ROOT(X)   :=  Y  ; 

end  ; 

L9:  print  ROOT  ; 

Plainly,  implicit  locate  operations  will  be  required 
at  instructions  L6,  L7  and  L8  if  no  basings  are  used. 
These  locate  operations  can  be  avoided  by  letting  all  the 
variables  be  based  on  the  same  base  and  carrying  out 
locate  operations  during  the  read  operations  at  Li  and  L2. 
This  basing  choice  is  certainly  profitable  because  L6,  L7 
and  L8  will  have  substantially  higher  frequency  than  LI 
and  L2.   But  how  can  we  make  such  a  basing  choice 
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systematically?  The  outline  of  a  possible  approach  can  be 
put  as  follows  : 

We  examine  the  instructions  at  which  implicit  locate 
operations  are  required.   For  each  of  these  instructions, 
we  consider  how  the  locate  operations  which  it  contains 
can  be  avoided.   For  exmple,  at  L6  and  L7 ,  no  locate 
operation  will  be  required  if  FATHER  and  Y  are  based  on 
the  same  base.   In  order  to  see  how  such  a  common  basing 
might  be  imposed  with  least  expense,  we  consider  the 
points  at  which  the  values  of  FATHER  and  Y  are  created. 
FATHER  is  created  at  L2  and  Y  is  created  at  LI  since  Y  is 
an  element  of  NODES.   Thus  we  see  that  if  NODES  and  FATHER 
are  properly  based  at  LI  and  L2  then  no  locate  operation 
will  be  required  at  L6  provided  X  is  also  based  on  the 
same  base.   This  pattern  of  basings  will  be  profitable 
because  LI  and  L2  have  lower  frequency  than  L6.   We 
therefore  introduce  a  base  B  and  force  NODES  and  FATHER  to 
be  based  on  B.   Repeating  the  same  procedure  for  L8,  we 
are  led  to  conclude  that  ROOT  and  X  should  be  based  on  the 
same  base.   Since  X  is  created  at  LI  and  NODES  has  been 
determined  to  be  based  on  B,  we  determine  that  ROOT  is 
also  to  be  based  on  B.   This  process  associates  basings 
with  the  objects  at  L1,L2  and  L3  .   Using  value  flow,  we 
can  then  propagate  these  basings  to  other  variable 
occurrences.   For  example,  the  X  appearing  at  m  and  L5  is 
given  tlxe  basing  'eB'.   As  a  final  step,  we  must  choose 
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remote,  local  or  sparse  representation  for  certain 
composite  objects.   In  the  above  example,  NODES,  FATHER 
and  ROOT  uill  all  be  given  local  representations  since 
none  of  them  are  subject  to  global  operations  such  as  set 
union  or  intersection. 

This  example  illustrates  our  scheme  for  automatic 
structure  choice  : 

1 .  We  examine  the  creation  points  of  the  values  uhich 
appear  as  arguments  to  operations  for  uhich 
implicit  locate  operations  are  required. 

2.  We  determine  proper  basings  for  the  values  created 
at  the  points  which  have  been  found  in  step  1  . 

3.  We  propagate  these  basings  to  other  variables  by 
using  value  flow. 

4.  Finally,  ue  determine  whether  the  composite  objects 
which  have  been  based  are  to  have  local,  remote  or 
sparse  representation. 

3 . 3  Overview  of  the  System 


Me  shall  now  begin  to  outline  an  automatic  data 
sructure  choice  system  uhich  rests  on  the  fundamental  idea 
explained  above.   The  whole  system  consists  of  five 
distinct  phases. 
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Phase  I  introduces  a  base  for  each  'live  period'  of  a 
set  or  map.   A  'live  period'  is  used  here  to  mean  a  set  of 
occurrences  of  a  given  variable,  which  are  linked  by  the 
data  flow  maps  FFROn  and  BFROM.   The  purpose  of  this  phase 
is  to  simplify  the  processing  needed  in  the  subsequent 
phases . 

Phase  II  merges  the  bases  created  in  the  preceding 
phase,  by  equivalencing  all  imputed  bases  attached  to 
ivariables  of  a  single  instruction.   In  addition,  this 
phase  emits  the  base  insertion  operations  uhich  enforce 
the  postulated  basing  relations  of  composite  objects. 
These  insertion  operations  are  generated  by  examining  all 
operations  involving  hashing  ('with',  map  storage,  etc) 
and  by  declaring  the  incorporated  item  as  being  an  element 
of  the  corresponding  base.   For  example,  an  appearance  of 
X  in  'S  with  X'  forces  X  to  be  member  based  on  the  base  B 
on  which  S  is  domain  based. 

Phase  III  optimizes  the  placement  of  the  'locate' 
instructions  generated  in  previous  steps.   This  amounts  to 
performing  a  type  of  forward  code  motion  on  these 
instructions.   The  need  for  such  code  motion  is  clear  from 
the  following  fragment  • 


(VXes)  Y  =  Y  +  1  ; ; 


Z  : =  F( Y)  ; 


The  appearance  of  Y  in  a  map  retrieval  operation  suggests 
that  Y  should  be  an  element  of  the  domain  base  of  F.   A 
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'locate'  instruction  placed  at  the  point  of  creation  of  Y 
will  however  generate  a  number  of  useless  basing  pointers 
because  all  of  them  (corresponding  to  successive  values  of 
Y  in  the  loop)  except  the  last  one  are  dead  (i.e., 
redefined  before  they  are  used).   Phase  III  moves  locate 
instructions  out  of  such  loops,  and  places  them  at  the 
lower  frequency  program  points  where  they  are  actually 
needed . 

Phase  IV  builds  up  a  detailed  description  of  the  mode 
of  each  variable  occurrence.   At  this  point,  all  useful 
basing  pointers  will  have  been  created  at  inserted  locate 
instructions.   Every  variable  use  which  needs  basing 
pointers  can  count  on  receiving  them  without  having  to 
execute  any  locate  operations,  as  long  as  basing  pointers 
are  properly  propagated.   A  base  on  which  only  one 
composite  object  is  domain  based  is  regarded  useless 
(since  basing  cannot  reduce  the  number  of  locate 
operations  required  during  execution)  and  will  therefore 
be  dropped.   This  phase  mainly  determines  how  these  basing 
pointers  should  be  propagated. 

Finally,  phase  V  refines  our  basing  choices,  by 
selecting  local,  remote  or  sparse  representations  for 
based  sets  and  maps  . 

After  this  general  introduction,  we  now  begin  to 
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explain  the  detailed  workings  of  each  phase  of  our  data 
structure  choice  system. 

3 .  ^    Phase  I  :  Base  Generation 

This  is  a  preparatory  phase,  which  generates  a  base 
for  each  'live  period*  of  a  set  or  map.   A  'live  period' 
is  used  here  to  mean  a  set  of  occurrences  of  a  given 
variable,  which  are  linked  by  the  chaining  functions  FFROM 
and  BFROn.   The  purpose  of  this  phase  is  to  simplify  the 
processing  needed  in  the  subsquent  phases. 


In  order  to  understand  the  purpose  of  this  phase,  let 
us  review  the  fundamental  idea  of  our  algorithm.   Our 
intent  is  to  examine  the  creation  points  of  the  values  of 
the  arguments  of  set  and  map  operations,  determine  the 
basings  of  the  ovariables  at  these  creation  points  and 
then  propagate  these  basings  to  other  variable 
occurrences.   A  difficulty  in  directly  implementing  this 
idea  is  that  the  basing  propagation  process  may  be 
somewhat  complicated  and  inefficient.   Since  at  an 
occurrence  of  a  composite  object  we  may  need  to  know 
member  basing  pointers  as  well  as  domain  basing  and/or 
type  information,  quite  complex  information  may  have  to  be 
propagated.   This  can  cause  the  basing  propagation  process 
to  be  even  more  inefficient  than  the  type  finder 
algorithm.   Another  difficulty  is  caused  by  the  fact  that 
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the  type  finding  algorithm  in  the  SETL  optimiser  may 
determine  that  the  types  of  certain  variable  occurrences 
are  indefinite,  e.g.,  set-or-map.   Due  to  the 
incompatability  betueen  the  based  representations  of 
different  types  of  composite  objects,  based 
representations  are  unsuitable  for  the  composite  objects 
of  indefinite  gross  type.   We  to  give  such  objects  the 
standard  representations.   Moreover,  objects  of  this  kind 
can  never  carry  basing  pointers  for  their  elements.   This 
fact  makes  it  necessary  to  revise  the  ideal  rule 
specifying  exclusive  use  of  creation  points  to  generate 
basing  pointers,  since  the  basing  pointers  generated  at 
creation  points  may  be  lost  during  the  path  to  set  or  map 
operations.   This  is  illustrated  by  the  follouing  example 


LI  :  Y  :=  Y  +  1  ; 


L2  ••  S  with  Y  ; 


L3  :  Z  from  S  ; 


L4  :  U 


F(Z) 


Suppose  that  F  is  a  map  and  S  is  of  indefinite  type.   In 
this  case,  ue  should  not  generate  basing  pointers  at  Li 
even  though  Y  is  a  creation  point  of  the  Z  appearing  at 
m.   Since  S  has  standard  representation,  the  Z  appearing 
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at  L3  can  never  be  based  in  a  uay  which  uill  eliminate  an 
explicit  locate  operation,  even  though  the  Y  inserted  into 
S  at  L2  is  based.   Thus  ue  uill  need  to  generate  basing 
pointers  at  L3  after  a  value  has  been  retrieved  from  S. 

In  order  to  overcome  these  difficulties,  ue  slightly 
modify  our  fundamental  idea  by  defining  'live  periods'  of 
composite  objects  and  using  pseudo  creation  points  instead 
of  creation  points.   A  'live  period'  is  defined  here  to  be 
a  set  of  occurrences  of  a  given  variable,  uhich  are  linked 
by  the  data  flou  maps  FFROM  and  BFROM.   Me  treat  the 
domain  basings  of  composite  objects  differently  from 
member  basings.   For  each  live  period  of  a  composite 
object,  if  all  the  occurrences  in  it  are  of  the  same  gross 
type,  ue  introduce  a  base  as  its  domain  base.   Houever,  no 
bases  are  introduced  for  the  live  periods  uhich  consist  of 
occurrences  of  indefinite  gross  types.   Then  at  each  set 
or  map  operation  ue  impose  the  condition  that  the  base  of 
the  set  or  map  be  the  member  base  of  the  element  objects 
uhich  appear  in  the  instruction.   This  information  is 
propagated  to  all  of  the  pseudo  creation  points  of  these 
element  objects.   A  pseudo  creation  point  of  an  occurrence 
X  is  an  occurrence  uhich  is  the  ovariable  of  a  value 
creation  or  value  retrieval  instruction  and  whose  value 
can  be  transmitted  to  X  through  simple  assignment 
instructions.   All  member  basings  transmitted  to  the  same 
pseudo  creation  point  are  then  identified  and  explicit 
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locate  instructions  are  emitted.   However,  uhen  a  pseudo 
creation  point  X  is  the  ovariable  of  a  value  retrieval 
instruction  in  which  the  composite  object  Y  from  which  a 
value  is  to  be  retieved  is  domain  based,  the  basing  mode 
of  X  is  identified  with  the  element  mode  of  Y,  but  no 
locate  instruction  is  generated,  since  X  will  inherit  a 
basing  point  from  Y.   Finally,  the  member  basings 
postulated  at  pseudo  creation  points  are  propagated 
through  the  program,  and  the  bases  of  different  composite 
objects  are  suitably  merged. 

The  basing  propagation  process  in  this  modified 
approach  remains  straightforward  in  view  of  the  fact  that 
only  member  basings,  instead  of  complete  basings  involving 
complex  type  information,  are  propagated.   The  use  of 
pseudo  creation  points  instead  of  creation  points  solves 
the  second  difficulty  mentioned  above.   Values  retrieved 
from  composite  objects  of  indefinite  gross  type  are 
treated  as  pseudo  creation  points  for  which  explicit 
locate  instruction  are  generated,  if  necessary.   Values 
retrieved  from  composite  objects  of  definite  gross  type 
are  assumed  to  inherit  basing  pointers,  and  therefore  no 
locate  instructions  are  generated. 

It  should  be  pointed  out  that  the  introduction  of  a 
base  for  each  live  period  of  a  set  or  map  implicitly 
implies  that  structure  conversion  will  never  be  required 
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for  the  occurrences  of  the  same  variable.   This  is  because 
two  variable  occurrences  of  composite  objects  which  may 
transmit  values  from  one  to  the  other  or  both  can  transmit 
or  receive  values  from  a  third  variable  occurrence  uill 
always  have  the  same  basings.   We  believe  that  even  manual 
data  structure  choice  uill  rarely  require  the  type  of 
conversion  that  our  system  forbids  and  therefore  that  only 
a  very  little  amount  of  data  structuring  power  uill  be 
lost  due  to  this  constraint.   It  should  also  be  noted  that 
the  reason  we  introduce  a  basing  for  each  live  period  of  a 
variable  having  a  composite  value,  rather  than  simply 
associating  a  basing  for  each  variable  name*  is  that  a 
variable  might  be  used  for  different  purposes  at  different 
points  in  a  program. 

To  facilitate  the  adjustment  of  modes  during  phase  V. 
it  is  convenient  to  assume  that  tuples  are  also  based* 
i.e.  that  their  components  are  elements  of  some  base  set. 
Introduction  of  such  bases  is  harmless,  because  if  no 
composite  objects  end  up  being  based  on  them,  they  will  be 
dropped  during  subsequent  phases. 


3 . 5  Phase  II  :  Locate  Emission  and  Base  Eguivalencing 


This  phase  is  central  to  our  system.   It  secures 

enforcement  of  the  basings  chosen  in  phase  I,  by 

generating  'base  insertion'  ('locate')  instructions  for 
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all  variable  occurrences  uhose  values  might  be 
incorporated  into  a  composite  object.   For  example,  the 
instruction  : 

SI  :=  S  with  X; 

leads  to  the  basing  relation  • 

X  :  eB  ; 

where  B  is  the  base  previously  assigned  to  the  variable 
occurrence  of  S.   This  basing  relation  for  X  is  enforced 
by  emitting  'locate'  instructions  for  the  ovariable 
occurrences  belonging  to  the  set  PS-CRTHIS {X} .   Here, 
PS-CRTHIS {X}  is  the  set  of  pseudo  creation  points  of  X, 
i.e.,  occurrences  which  are  the  ovariables  of  value 
creation  or  value  retrieval  instructions  and  whose  values 
can  be  trasmitted  to  X  through  simple  assignment 
instructions . 

A  similar  approach  is  taken  to  map  retrieval  and  store 
operations.  If  in  phase  I  the  map  F  has  been  assigned  the 
mode  'map( eB  1  ) eB2 ' ,  then  the  instruction 

F(X)  :  =  Y  ; 

will  imply  the  basing  relation 

X  :  €B1  ; 
Y  :  eB2  ; 
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In  this  case,  locate  instructions  (into  Bl  and  B2)  are 
emitted  for  the  occurrences  in  PS-CRTHIS {X}  and 
PS-'CRTHIS{Y}  ,  respectively. 

Note  that  these  'locate'  instructions  are  not  directly 
inserted  into  the  code,  but  are  kept  in  a  temporary  set, 
for  the  follouing  reasons  : 

A)  The  bases  being  used  at  this  stage  are  not  the  actual 
bases  which  will  appear  at  run-time.   Actual  bases  will  be 
determined  subsequently  by  building  up  equivalence  classes 
of  the  base  names  introduced  in  phase  I. 

B)  Some  bases  may  eventually  prove  useless,  because  they 
support  only  one  composite  object,  in  which  case  all 
'locate'  instructions  which  reference  them  must  be 
dropped . 

As  we  proceed  in  enforcing  basing  relations, 
equivalence  relations  emerge  among  bases.   Suppose  that  we 
are  in  the  process  of  generating  a  locate  instruction  to 
insert  the  value  appearing  at  a  pseudo  creation  Y  into  the 
base  Bl  of  X.    Then  if  Y  is  the  ovariable  of  a  value 
retrieval  instruction  and  the  composite  object  from  which 
the  value  is  to  be  retrieved  has  been  domain  based  on  a 
base  B2  (i.e.,  the  composite  object  is  of  an  unambiguous 
type  and  a  base  has  been  introduced  for  it  during  the 
phase  I)  so  that  Y  can  be  expected  to  be  member  based  on 
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B2,  then  ue  just  equivalence  the  bases  B1  and  B2  uithout 
generating  any  locate  instruction.   Moreover,  even  if  Y 
cannot  be  expected  to  be  member  based  but  Y  has  already 
been  assigned  a  locate  instruction  which  uill  insert  its 
value  into  a  base  B3,  ue  do  not  generate  a  neu  locate 
instruction  either,  but  just  equivalence  the  bases  B1  and 
B3.   Certain  other  instructions  force  similar  base 
equivalencing  rather  than  generating  locate  instructions  '• 
e.g.,  set  union  and  intersection  force  their  arguments 
have  the  same  base. 

The  existence  of  an  equivalence  relation  between  tuo 
bases  B1  and  B2  means  that  B1  and  B2  are  to  be  considered 
as  tuo  names  of  the  same  actual  base  B,  (which  uill  emerge 
as  the  representative  of  the  equivalence  class  to  uhich  Bl 
and  B2  belong)  .   The  base  equivalencing  procedure  is 
carried  out  by  using  the  compressed  balanced  tree 
technique  described  by  Hppcroft  and  Tarjan.   Equivalence 
classes  of  bases  are  represented  by  a  forest  of  trees. 
The  root  of  each  tree  in  this  forest  is  the  representative 
(and  is  called  the  real  base)  of  the  bases  in  the  tree. 
Trees  are  structured  using  a  map  PARENT  ;  PARENT(B)  points 
to  the  parent  node  of  B  in  the  tree  containing  B  if  B  is 
not  a  root,  otheruise  PARENT(B)  is  undefined. 


The  process  of  base  equivalencing  and  locate 
generation  just  described  is  complicated  by  the  existence 
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of  procedure  calls  and  the  need  to  take  variable  and  base 
scopes  into  account.  For  a  given  variable  occurrence  VO, 
±or  which  a  base  BO  has  been  suggested,  the  following  may 
be  the  case  • 

A)  VO  is  an  occurrence  of  a  global  variable  v.  Then  it  is 
reasonable  to  assign  to  all  its  occurrences  the  same 
basing  (or  more  precisely,  to  associate  one  global  base 
with  each  of  its  live  periods.   See  above).   The  base 
associated  with  such  variables  is  therefore  a  global  base. 

B)  VO  is  an  occurrence  of  a  formal  parameter  of  the 
procedure  P.   Then  if  a  base  exists  for  VO,  this  base  is  a 
formal  one  ;  each  call  to  P  will  instantiate  it,  by 
passing  to  P  some  actual  base  AB,  (which  will  be  the  base 
of  the  actual  calling  parameter  AV,  to  which  VO 
corresponds).   It  is  then  reasonable  to  require  that  all 
actual  parameters  at  various  points  of  call  have  the  same 
form  as  that  chosen  for  VO,  but  the  actual  bases  in  each 
case  may  be  distinct  and  it  would  be  unwise  to  eqijivalence 
them  (since  equivalencing  more  bases  than  strictly 
necessary  may  lead  to  the  creation  of  very  sparse 
objects).   But  it  is  reasonable  to  equivalence  all  the 
bases  which  may  appear  at  a  given  point  of  call.   This  is 
achieved  by  partitioning  PS-CRTHIS { VO }  according  to  the 
point  of  call  by  which  a  given  occurrence  VOX  becomes  the 
value  of  VO.   Then  the  bases  occurring  in  each  such 
partition  can  be  equivalenced .   The  following  example 
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illustrates  this  idea. 


proc  A  ; 


proc  B  ; 


XI  :=  {U1, .  .  .}  ; 
Yl  :=  {VI, . . .}  ; 
C(X1  , Y1  )  ; 


X2  :=  {U2,  .  .  .  }  ; 
Y2  :=  {V2,  .  .  .}  ; 
C(X2,  Y2)  ; 


end  proc  A  ; 


end  proc  B 


proc  C(X,Y)  ; 


X  +  Y  ; 


end  proc  C  ; 

In  this  case,  the  bases  of  XI  and  Yl  are  equi\   anced,  and 
the  bases  of  X2  and  Y2  are  equivalenced  but  the  bases  of 
XI  and  X2  are  not  equivalenced. 

Note  that  if  VO  is  not  a  formal  parameter,  but  is 
nevertheless  linked  to  the  formal  parameters  of  P  through 
value-flou,  then  the  preceeding  remarks  still  apply  :  VO 
may  be  based  on  a  formal  base,  i.e.  some  base  of  the 
formal  parameters  of  P.   In  such  cases,  the  same 
partitioning  of  PS-CRTHIS  according  to  points  of  call  is 
used  . 

C)  Finally,  VO  may  be  local  to  P  ,  i.e.  it  may  be  a  local 
variable  uhose  value  is  created  only  within  P,  and  uhich 
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does  not  enter  into  any  operation  uhose  other  argument  are 
global  or  linked  to  points  of  call  of  P.   In  that  case,  VO 
(and  the  other  arguments  of  operations  in  which  VO 
appears),  receives  an  actual  local  base. 


3 . 6  Phase  III  •    Locate  Insertion 

This  phase  moves  'locate'  instructions  out  of  loops. 
This  motion  is  performed  whenever  the  basing  pointer 
generated  by  a  'locate'  insruction  is  not  actually  used 
within  the  loop.  The  following  case  is  typical  :  a 
variable  X  is  known  to  be  'cb'  and  PS-CRTHIS{X}  includes 
the  occurrences  of  X  shown  in  the  following  text  : 

(VI  :=  1 .  .  .  100)  X  : =  X  +  Y  ;  ; 

Phase  II  will  have  hypothesised  a  locate  instruction 
initially  taken  to  lie  within  the  loop  for  the  ovariable  X 
occuring  therein.   However,  it  is  clearly  unwise  to 
actually  put  this  locate  instruction  within  the  loop, 
because  none  of  the  values  assigned  to  X  (except  the  last 
one)  is  used  as  a  base  element  •    the  basing  pointer  is 
dead  within  the  loop.   The  proper  place  for  the  locate 
instruction  is,  of  course,  the  exit  from  the  loop. 
Generally  speaking,  a  locate  instruction  can  be  moved  out 
of  an  interval  if  no  use  is  made  within  the  interval  of 
the  basing  pointer  which  it  generates.   This  can  be 
ascertained  by  following  FFROM  of  the  (previously)  located 
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variable.   If  ue  reach  an  operation  uhich  uses  the  basing 
pointer  within  the  interval  then  the  locate  instruction 
cannot  be  moved.   If  the  use  appears  in  some  successor 
interval  then  it  is  advantageous  to  move  the  locate 
operation  to  the  head  of  that  interval. 

The  following  procedure  systemizes  the  process  of 
locate  instruction  motion.   We  scan  the  FFROM  chain  for 
each  occurrence  01  at  uhich  a  locate  instruction  has  been 
suggested  in  phase  II.   The  scanning  procedure  continues 
until  ue  find  all  the  places  at  uhich  the  basing  pointer 
created  at  01  might  potentially  be  used.   The  intervals 
uhich  contain  these  points  are  called  the  target  intervals 
of  01,  and  a  map  MOVETO  summarizing  this  information  is 
generated.   If  one  of  the  target  intervals  of  01  is  the 
interval  in  uhich  01  resides,  MOVETOCOI}  is  defined  as  nl . 
Me  use  MOVETO  to  insert  actual  locate  instructions  as 
follows.   If  riOVETO{OI}  is  nl  then  a  locate  operation  is 
inserted  right  after  01  is  created.   Otherwise,  for  each 
interval  INT  in  n0VET0{0I},  a  locate  operation  is  inserted 
at  the  entry  to  the  largest  interval  uhich  includes  INT 
but  not  01. 

iJote  that  the  procedure  just  descrioed  is  costly  but 
can  hav(  - ignif icunt  advantages  in  some  cases. 
Nevertheless,  a  study  of  examples  seems  to  indicate  that 
there  are  normally  a  very  limited  number  of  bases  existing 
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in  a  program  and  very  feu  of  them  whose  associated  locate 
instructions  have  to  be  moved.   Overall,  ue  judge  that 
this  locate  movement  procedure  is  marginal  and  might  be 
omitted  if  a  period  of  experiment  tests  conforms  the 
judgement  just  stated. 


3 . 7  Phase  IV  •     Hode  Determination 

This  phase  completes  building  of  the  mode  descriptor 
for  each  variable  occurrence. 

It  is  important  to  note  that  a  base  is  useful  only  if 
at  least  tuo  composite  objects  are  based  on  it,  because 
then  the  basing  pointers  held  by  one  can  be  used  to  access 
the  other-   If  a  base  is  simply  the  domain  of  a  map  (and 
nothing  else)  then  nothing  is  gained  by  its  existence, 
because  there  is  no  uay  to  generate  elements  of  that 
domain  without  recalculating  the  corresponding  basing 
pointer.   The  same  is  true  if  the  only  objects  supported 
by  a  base  are  a  set  and  its  elements.   In  this  case,  the 
map  (or  set)  should  be  unbased. 


In  this  phase,  ue  also  determine  whether  member  basing 
and/or  domain  basing  should  be  associated  with  the  values 
appearing  at  each  variable  occurrence.   It  is  possible 
that  the  values  appearing  at  certain  program  points  shoud 
be  given  both  member  basing  and  domain  basing,  due  to  the 
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subsequent  pattern  of  uses  of  the  value.   In  this  case, 
multiple  representations  are  constructed.   This  is 
illustrated  by  the  following  example  = 

(1)  (V  S€U  ) 

(2)  (while. . . ) 

(3)  (VXes) 


(U) 


Y  :=  F(X)  ; 


end  V  ; 
(5)  if  S  in  T  then....; 

end  while  ; 
end  V  ; 

The  S  in  the  first  instruction  should  have  a  dual  basing 
because  the  member  basing  of  S  is  useful  in  the  fifth 
instruction  and  the  domain  basing  of  S  is  also  useful  at 
the  third  instruction  as  the  X  which  retrieves  value  from 
S  is  subsequently  subject  to  a  map  value  retrieval 
operation  at  the  fourth  instruction.   Accordingly,  S  is 
assigned  multiple  representations,  which  have  member 
basing  and  domain  basing,  respectively. 


This  idea  is  implemented  in  three  steps. 

A)  For  composite  objects  and  member  based  objects,  ue 
replace  the  member  basings  referencing  dropped  bases  by 
element  mode  of  the  dropped  bases.   For  example,  if  (in 
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pahse  I)  a  set  S  is  tentatively  doamin  based  on  a  base  B1 
whose  elements  are  seen  (in  phase  II)  to  have  the  mode 
[€B2,€B2]  and  if  B1  is  subsequently  dropped  (because  it 
supports  only  single  composite  object)  then  S  is 
re-assiqned  the  mode  ' set ( I gB2 , eB2  I ) '  . 

B)  At  each  occurrence  we  determine  whether  domain  basings 
and/or  member  basings  are  useful  by  examining  the 
subsequent  uses  of  the  value  appearing  at  this  occurrence 

C)  We  then  propagate  member  basings,  starting  from  locate 
and  value  retrieval  instructions,  to  other  occurrences 
which  need  basings.   The  propagation  procedure  ensures 
that  proper  basings  are  carried  along  with  variable 
values . 


3 . 8  Phase  V  :  Refinement 

This  phase  refines  the  basing  selection  made  by  phase 
I-IV,  by  associating  the  representation  attributes, 
•local',  'remote'  or  'sparse'  with  set  and  map 
representations.   The  manner  in  which  this  is  done 
reflects  characteristics  of  the  different  representation 
sructures  implied  by  these  keywords. 


The  advantage  of  local  representation  of  a  map  over 
remote  representation  lies  in  the  fact  that  reference  to  a 
locally  stored  map  or  set  is  somewhat  faster  than 
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reference  to  a  remotely  stored  map  or  set,  since  at  least 
one  level  of  indirection,  and  probably  also  an  out- 
of-bounds  check,  can  be  avoided.   On  the  other  hand,  an 
object  having  local  representation  cannot  be  shared  but 
must  be  copied  at  every  point  at  which  its  share  bit  would 
be  set.   Boolean  operations  such  as  =,  +,  etc.,  are  also 
relatively  inefficient  for  the  locally  stored  sets, 
compared  with  the  same  operations  on  objects  having  the 
remote  set  representations. 

Another  significant  point  concerns  the  iteration 
operator  in  its  relation  to  based  representations.   It  is 
clear  that  the  linked  hash-table  used  to  represent  unbased 
sets  supports  iterations  efficiently.   Iterations  over 
sets  having  based  representations,  whether  local  or 
remote,  are  more  expensive,  because  they  involve  an 
iteration  over  the  base,  and  a  series  of  tests  for 
membership  in  the  based  subset.   The  overhead  incurred  by 
iterating  over  the  base  will  be  greater  the  sparser  the 
based  object  is.   It  is  therefore  necessary  to  review  and 
possibly  to  revise  our  primary  basing  selection  for  the 
objects  which  are  subject  to  iteration  operations. 


Moreover,  proper  choice  among  these  three 
representations  depends  not  only  on  the  kind  of  use  made 
of  the  based  objects  but  also  the  frequencies  of  these 
uses  and  the  si:::e  of  the  base.   This  is  not  something  that 
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can  be  discerned  statically  and  thus  is  information  not 
available  from  SETL  optimizer.   In  the  absence  of  such 
information,  ue  adopt  a  very  conservative  approach.   The 
heuristics  ue  apply  amount  to  the  follouing  '• 

A)  A    based  object  should  be  sparse  if  it  is  to  be  iterated 
upon,  unless  ue  can  shou  that  the  object  is  actually 
identical  in  value  uith  its  base. 

B)  If  no  iteration  over  an  object  is  performed,  but  it  is 
an  argument  to  boolean  operations  (union,  intersection, 
etc)  or  is  passed  as  a  parameter,  or  is  to  be  copied,  or 
inserted  into  a  larger  object,  then  it  should  be  given 
remote  representation. 

C)  If  only  differential  updating  operations  are  applied  to 
an  object,  and  it  is  never  transmitted  to  another  by 
assignment,  insertion  or  call,  then  it  can  have  a  local 
representation. 


3  .  9  Supplementary  Remarks 

Our  main  idea  is  to  insert  elements  into  a  base  B  or 
to  locate  elements  in  B  along  lou  frequency  paths  and 
carry  along  the  basing  pointers  thus  generated  uhen  values 
are  transmitted.   This  makes  it  possible  to  avoid  locate 
operations  in  high  frequency  regions.   Our  technique  is 
therefore  a  variant  of  code  motion.   Houever,  the  motion 
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of  locate  operations  is  somewhat  different  from  the  motion 
of  more  general  expressions.   To  handle  more  general 
eKpressions,  ue  must  deal  with  value  preservation  issues  ; 
while  to  handle  locate  operations  ue  have  to  deal  uith 
somewhat  different  issue  of  pointer  preservation.   The 
basic  idea  in  moving  a  general  expression  EXP  is  to 
calculate  the  value  of  E-XP  beforehand  and  then  to  use  such 
value  without  recalculation  at  the  subsequent  appearances 
of  EXP.   This  idea  is  applicable  only  if  the  value  of  EXP 
will  not  be  changed  between  its  calculation  and  its 
applications.   Therefore,  a  general  expression  EXP  can 
only  be  moved  without  affecting  the  logic  of  the  program 
to  which  EXP  belongs  within  a  region  in  which  none  of  the 
arguments  of  EXP  are  re-defined.   On  the  other  hand,  we 
can  move  a  locate  operation  by  inserting  a  value  V  into  a 
base  to  derive  a  basing  pointer  and  then  can  avoid 
re-executing  locate  operations  by  utliaing  the  basing 
pointer  at  subsequent  applications  of  V.   A  variable  X 
which  will  only  hold  such  'based'  values  can  therefore  be 
referenced  without  furtlier  locate  operations,  as  long  as 
the  basing  pointers  of  'based'  values  are  transmitted  as 
well  as  the  values.   Thus,  motion  of  locate  operations  is 
possible  under  weaker  restrictions  than  code  motion  of 
general  expressions. 

Unlike  a  manual  basing  declaration  which  assigns  each 
declared  variable  one  or  more  basings,  our  automatic 
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basing  choice  system  assigns  each  variable  occurrence  one 
or  more  basings.   This  makes  it  possible  for  a  variable  to 
have  different  representations  at  different  program 
points.   The  required  representation  conversions  for  a 
variable  are  indicated  by  simple  assignment  instructions 
from  the  variable  to  itself  of  uhich  the  ivariables  have 
the  representation  structures  to  be  converted  from  and  the 
ovariables  have  the  representation  structures  to  be 
converted  to.   This  scheme  can  very  often  eliminate  the 
necessity  to  assign  a  variable  multiple  representations 
throughout  the  whole  program  or  to  manually  rename  some 
occurrences  of  a  variable  and  assign  a  different 
representation  to  the  newly  created  variable.   In  case 
that  a  variable  does  need  multiple  representations,  our 
algorithm  will  define  the  representation  uhich  should  be 
referenced  at  each  ivariable  occurrence  and  the 
representation  which  should  be  updated  at  each  ovariable 
occurrence . 
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CHAPTER  4  :  EXAMPLES 

In  this  chapter,  ue  illustrate  the  results  potentially 
obtainable  by  our  automatic  data  structure  choice  system 
by  examining  the  way  in  which  it  would  apply  to  a  number 
of  programs  written  in  SETL.   Both  the  actions  and  the 
results  of  our  system  are  presented.   Type  information  and 
value  flow  chains  are  assumed  to  be  available  from  the 
SETL  optimizer. 

To  avoid  the  special  problems  which  would  otherwise  be 
connected  with  input  operations,  we  insert  a  set  or  tuple 
former  instruction  after  each  'read'  statement,  to  make 
the  structure  and  the  elements  of  the  objects  explicit. 
For  example,  if  the  F  of  the  statement  'read  F'  is  known 
to  be  of  type  'map',  we  replace  the  original  read 
statement  by  the  following  two  instructions. 

read  F'  ; 

F  :=  {  [EF1  :=  EF(1),  EF2  :=  EF(2)   1,  EF€F'  }  ; 


Then  we  let  F'  to  be  unbased  and  try  to  choose  proper 
representation  structure  for  F.   Here,  the  assignments  to 
EF1  and  EF2  are  treated  as  value  creations  instead  of 
value  retrievals.   Although  certain  expansion  instructions 
of  this  type  could  be  inserted  by  the  SETL  optimir:er  as  a 
result  of  information  obtained  by  'backward  type 
analysis',  we  insert  this  code  exlicitly  in  the  following 
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examples  in  order  to  simplify  our  discussion.   Note  also 
that  the  necessary  insertions  can  be  deduced  from  'repr' 
declarations,  if  we  agree  that  such  a  declaration  must  be 
given  for  every  variable  appearing  in  a  'read'  statement. 

As  our  automatic  basing  choice  algorithm  deals  uith 
variable  occurrences  and  assigns  each  of  them  one  or  more 
representation  structures,  it  is  impractical  to  describe 
the  effect  of  each  phase  of  our  algorithm  on  every 
variable  occurrence.   To  overcome  this  expository 
difficulty,  we  will  use  each  variable  name  to  represent 
all  of  its  occurrences  unless  specified.   The  basing 
choice  resulting  from  our  algorithm  will  then  be  described 
by  an  equivalent  manual  declaration. 

In  presenting  our  examples,  ue  first  briefly  describe 
the  programs  to  be  discussed  and  list  the  SETL  code 
analysed  in  each  case.   Then  the  action  of  our  automatic 
data  structure  choice  algorithm  is  tracked  phase  by  phase 
;  the  result  expected  from  each  phase  is  outlined.   The 
final  structure  choice  expected  from  our  algorithm  is 
summarized  at  the  end  of  each  example. 


^^ .  1  Example  1  ■    Tree  Traversal 


As  a  first  example,  we  study  the  postorder  tree 
traversal  algorithm  given  by  knuth[1968  1.   The  inputs  to 


this  algorithm  are  the  root  of  a  tree  and  tuo  maps 
defining  left  and  right  links  respectively.   The  function 
of  the  algorithm  is  to  give  each  node  of  tlie  tree  its 
ordinal  number  in  post  order. 

module  TREE-TRAVEL  ; 


LI 
L2 
L3 
L4 
L5 

L6 
L7 
L8 


read  ROOT , ILLINK , IRLINK  ; 

LLINK  :=  {  lLKl:=LK(n,  LK2:=LK(2)1:  LKCILLINK  }   ; 

RLINK  :=  {  [RK1:=RK(1),  RK2:=RK(2)]:  RKCIRLINK  }   ; 


POSTORDER  :=  nl  ; 
STACK  : =  [  I  ; 

NODE  :=  ROOT  ; 
ORDINAL  :=  0  ; 
GO-OH  :=  TRUE  ; 


$  Initialize  the  map  POSTORDER, 
$  The  stack  is  represented  by 
$  a  tuple . 


L9:     (while  GO-ON  ) 
L10:        (  while  NODE/=^M  ) 

L11:  STACK  I  I  [ NODE  1  ;       $  Push  NODE  into  STACK. 

L12t  NODE  :=  LLINK(NODE)  ;  $  Get  left  descendant 

end  while  ; 


L13: 


LI  4: 


LIS: 


if  STACK=l  ]  then  GO-ON  :=  FALSE  ; 

$  Check  whether  stack  is  empty 
else  *  If  not, 

NODE  :=  STACK(#STACK)  ; 

$  Pop  out  a  node  form  STACK. 
STACK  ••  =  STACK(  1  :«STACK-1  )  ; 
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L16: 
L17: 

L18S 


ORDINAL  :=  ORDINAL  +  1  ; 
POSTORDER(NODE)  :=  ORDINAL  ; 

$  Assign  ordinal  order  to  NODE. 
NODE  :=  RLINK(NODE)  ;     $  Get  right  descendant, 
end  if  ; 


end  while  ; 


L19:   print  POSTORDER  ; 


end  module  TREE-TRAVEL  ; 


Four  composite  objects  occurring  in  this  algorithm 
have  to  be  processed  =  LLINK  and  RLINK  are  input  maps, 
POSTORDER  is  the  output  map  and  STACK  is  a  work  stack 
represented  by  a  tuple.   Phase  I  introduces  domain  and 
range  bases  for  these  objects. 

LLINK  :  map(GBl)€B2  ; 
RLINK  :  roap(eB3)eB4  ; 
STACK  ••  tuple(GB5)  ; 
POSTORDER  :  map(€B6)GB7  ; 

In  phase  II,  ue  eKamine  the  set,  map  and  tuple  operations 
at  L2,L3,L11  L 1 2 , L 1 4 , L 1 7  and  L18.   The  pseudo  creation 
points  of  the  occurrence  NODE  appearing  at  these 
instructions  are  the  NODE  at  L12,  Lm  and  L18,  and  the 
ROOT  at  LI.   L2  and  L3  suggest  that  the  LK 1 ,  LK2,  RK 1  and 
RK2  are  to  be  elements  of  the  bases  B1,  B2,  B3  and  B4, 
respectively.   Other  instructions  contribute  as  follows. 
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L11  and  L14  equivalences  B2,  B4  and  B5  ;  L12  equvalences 
B1,  B2  and  B4  ;  L17  equivalences  B2,  B'4  and  B6  ;  L8 
equivalences  B2,  B3  and  B4.   The  set  of  bases  is  therefore 
partitioned  into  tuo  equivalence  classes 
{Bl ,B2, B3, B4,B5,B6}  and  {B7}.   Let  the  base  name  B 
represent  the  first  equivalence  class.   The  base  B7  is 
dropped  since  only  the  range  of  POSTORDER  is  based  on  it. 

Phase  III  physically  inserts  locate  instructions  at 
LI,  L2  and  L3  ;  these  put  ROOT,  LK1,  LK2,  RK 1  and  RK2  into 
the  base  B.   In  phase  IV,  the  references  to  B7,  €B7,  are 
replaced  by  the  mode  of  the  element  of  B7 ,  uhich  is  known 
to  be  integer.   The  basings  of  composite  objects  then 
become 


LLINK  :  map(€B)eB 
RLIMK  :  map(eB)€B 
STACK  :  tuple(€B) 
POSTORDER  :  map(€B)int  ; 


Phase  IV  also  determines  the  following  basings  for  other 
occurrences . 

ROOT,  NODE  :  eB  ; 
ORDINAL  :  int  ; 
GO-ON  ••  bool  ; 

Finally,  phase  V  decides  that  all  the  composite  objects 
should  have  local  representations. 
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4  . 2  Example  2  :  Spanniner  Tree 

The  second  example  ue  present  here  is  the  spanning 
tree  algorithm  given  by  Lou[  1974  1.   This  program  computes 
a  spanning  tree  for  a  graph.   The  graph  consists  of  a  set 
of  nodes  (NODES)  and  a  set  of  undirected  edges  between 
pairs  of  nodes  (EDGES).   The  program  assumes  that  there  is 
a  path  (through  0  or  more  other  nodes)  between  every  pair 
of  nodes.   A  spanning  tree  for  the  graph  consists  of  a 
subgraph  containing  all  the  original  nodes  and  a  subset  of 
the  edges  of  the  original  graph  such  that  = 

1)  For  any  pair  of  distinct  nodes  there  exists  a  unique 
path  between  the  nodes. 

2)  There  is  no  path  from  a  node  to  itself  (the  subgraph  is 
cycle-  free  )  . 

module  SPANTREE  ; 

vars   NODES,  EDGES,  FATHER   end  vars  ; 

LI  :   read  INODES,  lEDGES  ; 

L2  :   NODES  :=  {  ND :  NDeiKODES  }  ; 

L3  :    EDGES:  =  {  ED  =  = I  ED  1  = = EDG (  1  )  , ED 2  =  =EDG ( 2 )  1  :  EDGeiEDGES  }   ; 

L4  :   (V  XeNODES  )  $  Initialize  the  map  FATHER. 

L5  :         FATHER(X)  ==  X  ;   ; 

L6  :    (V  ECEDGES  ) 
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L7  :       F  :=  ECn  ;  $  F  and  S  are  the  two  end  nodes 

L8  ••        S  :=  E(2)  ; 

L9  :        FG  :=  GROUPOF(F)  ;   *  FG  and  SG  are  the  roots  of  th( 
L10  :       SG  :=  GROUPOF(S)  ;   $  trees  uhich  contain  F  and  S, 

$  respectively. 
L11  :       if  FG  /=  SG  then     $  If  F  and  S  are  not  in  the  sam( 

$  tree  then 
L12  s  TREESET  with  E  ; 

L13  :  MERGE(FG , SG)  ;   $  their  corresponding  trees 

$  are  merged, 
end  if  ; 
end  V  ; 

L14  :  print  TREESET  ; 

proc  GROUPOF(HODE)  ;     $  To  find  the  root  of  the  tree 

$  uhich  contains  node . 

LIS  :  while  (  FATHER(KODE)  /=  NODE  ) 
L16  :       NODE  :=  FATHER(NODE)  ; 
end  while  ; 

L17  :  return  NODE  ; 

end  proc  GROUPOF  ; 

proc  nERGE(G1,G2)  ;      $  To  merge  two  trees. 

L18  :  FATHER(G2)  :=  G1  ; 
L19  :  return  ; 
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end  proc  MERGE  ; 

end  module  SPANTREE  ; 

In  this  module,  we  deal  with  six  composite  objects  : 
NODES  is  a  set,  EDGES  and  TREESET  are  sets  of  pairs, 
FATHER  is  a  single-valued  map,  and  ED  and  E  are  tuples  of 
length  2.   In  phase  I,  ue  introduce  bases  for  these 
variables . 

NODES  ••  set(€B1)  ; 
EDGES  :  set(eB2)  ; 
TREESET  :  set(€B3)  ; 
FATHER  :  smap(eBU)€B5  ; 
ED  :  tuple(eB6, €B7)  ; 
E  :  tuple( €B8, CB9)  ; 

In  phase  II,  ue  examine  all  set,  map  and  tuple  operations. 
The  information  contributed  hy    each  instruction  is  as 
follows . 

L2  :   ND  is  identified  as  an  element  of  B1. 

L3  :   EDI,  ED2  and  ED  are  identified  as  elements  of 
B6,  B7  and  B2  respectively.   The  mode  of  B2  becomes 
'base( I GB6,€B7  1  )  '  . 

L5  :   The  pseudo  creation  point  of  X  is  the  X  at  L4  ; 
this  retrieves  X  from  NODES  which  is  a  set  based 
on  B1.   The  bases  Bl,  B4  and  B5  are  therefore 
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identified . 

L7,  L8  ■    The  pseudo  creation  point  of  E  is  the  E  at  L6 
which  is  the  ovariable  of  a  value  retrieval 
instruction.   The  mode  of  E  is  identified  with 
the  element  mode  of  the  base  of  EDGES.   Since  B2 
already  has  the  mode  '  base ( I eB6 , eB7  1 ) '  ,  B8  and 
B9  are  identified  with  B6  and  B7  repectively. 

L12  :  The  pseudo  creation  point  of  E  is  its  occurrence 
at  L6.   The  node  of  E  is  identified  with  the 
element  mode  of  the  base  of  EDGES,  i.e.,  B2  and 
B3  are  equivalenced . 

L15,L16  :  The  pseudo  creation  points  of  NODE  are  the  F 
at  L7,  the  S  at  L8  and  the  NODE  at  L16.   At  L7 
and  L8,  the  base  B4  is  equivalenced  with  B8  and 
B9.   At  L16,  the  base  B4  is  equivalenced  with 
B5. 

L18  •     This  instruction  has  the  same  effect  as  the 
instructions  L15  and  L16  since  the  pseudo 
creation  points  of  G1  and  G2  are  the  same  as 
those  of  NODE. 


The  equivalence  relations  just  mentioned  lead  us  to 
identify  the  bases  B1,  B4,  B5,  B6,  B7,  B8  and  B9  with  each 
other,  and  the  bases  B2  and  B3  with  each  otlier. 
Furthermore,  B2  is  known  to  be  a  base  of  elements 


leBl,€Bl].   Phase  III  inserts  explicit  locate  operations 
for  ND,  EDI,  ED2  and  ED  at  L2  and  L3.   Phase  IV  propagates 
the  basings  that  have  been  derived  to  other  variables. 
This  yields  the  following  basings  : 

E  :  €B2,  I GB1  ,£31  ]  ;   (at  L5) 

E  :  eB2  ;   (at  L12) 

F,  S,  MODE,  G  1  ,  G2  : eB 1  ; 

Note  that  the  tuple  E  at  L6  is  assigned  the  member  basing 
'eB2'in  addition  to  the  basing  ' tuple ( €B 1 , eB 1 ) '  uhich  has 
been  assigned  to  E  during  phase  I,  i.e.,  E  is  given 
multiple  based  representations.   The  occurrence  of  E 
appearing  at  L7  and  L8  will  make  use  of  the  domain  basing 
and  the  E  appearing  at  L12  will  make  use  of  the  member 
basing . 

Finally,  noting  that  all  the  elements  in  B2  (i.e.,  ED 
at  L3)  have  been  inserteid  into  the  based  subset  EDGES  of 
B2,  ue  conclude  that  EDGES  and  B2  are  identical  (in 
value).   Therefore,  phase  V  decides  that  EDGES  as  uell  as 
other  maps  and  sets  (except  NODES)  can  be  locally  based, 
even  though  EDGES  is  subject  to  iteration  operations. 
NODES  is  however  given  sparse  structure  because  it  is 
subject  to  iteration  operation  and  ue  cannot  identify  it 
uith  its  base. 

The  basings  uhich  result  from  these  choices  can  be 
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summarized  by  an  equivalent  declaration  as  follows 

B1  :  base  ; 

B2  :  base( I €B1 ,€811)  ; 

NODES  :  sparse  set(eBI)  ; 

EDGES,  TREESET  :  local  set(€B2)  ; 

FATHER:  local  smap(€Bl)eBl  ; 

ED  :  tupleCeel , €B1 )  ; 

E  :  eB2,  tuple ( eB 1 , €B 1)  ; 

F,  S,  NODE,  G1  ,G2  :  eBl  ; 


H .  3  Example  3  =  Huffman  Codincr 

The  third  example  to  be  discussed  is  Huffman's  data 
compaction  algorithm.   This  algorithm  assigns  a  binary 
code  to  each  character  in  such  a  uay  that  most  probable 
characters  receive  short  codes,  uhile  less  probable 
characters  receive  longer  codes.   Huffman's  technique  is 
as  follows  :  for  a  given  set  of  characters,  CHARS,  which 
we  assume  to  be  given  along  with  their  expected 
frequencies,  FRES,  take  the  two  characters  CI  and  C2  of 
smallest  frequency,  and  hang  them  as  left  and  right 
branches  from  a  newly  created  node  N,  whose  heuristtic 
meaning  is  'either  Cl  or  C2'.   Then  we  remove  CI  and  C2 
from  the  set  of  characters  and  insert  N,  taking  its 
frequency  to  be  the  sum  of  that  of  CI  and  C2.   Repeat  this 
operation  until  only  a  single  character  remains.   This 
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process  will  grou  a  tree,  the  so-called  Huffman  tree  of 
the  set  of  characters.   The  code  for  a  character  is  then 
its  address  as  a  tuig  of  this  tree,  where  'go  doun  to  the 
left'  is  represented  by  a  binary  0,  and  'go  doun  to  the 
right*  is  represented  by  a  binary  1. 

A  code  much  like  the  following  can  be  found  in 
Schuartz[  1973  ],  pp. 148-151, 

module  HUFFMAN  ; 

vars   WORK,  WFREQ,  L,  R   end  vars   ; 

LI  :   read  CHARS,  FREB  ; 

L2  5   WORK  :=  {  CHAR:  CHARecHARS  }  ;    $  Initialise  workfile 
L3  ••   WFRE2  :=  {  lWFl:=WF(1),  WF2:=WF(2)1:  WFeFREQ  }   ; 
L4  :   L  : =  nl  ;   R  : =  nl  ;  $  L  and  R  are  maps  from  a  node  to 

$  its  left  and  right  descendants. 


L5  s   (  while  tWORK  >  1  ) 

L6  :        WORK  less  (  Cl  :=  GETMIH ( WORK )  )  ; 

L7  :        WORK  less  (  C2»  ==  GETMIN(WORK)  )  ; 

$  Get  the  two  nodes  with  minimal 

$  frequencies. 
L8  !        WORK  with  (  N  :=  NEWAT  )  ;$  Generate  a  new  node. 
L9  :       LCN)  :=  CI  ;      $  Build  a  subtree. 
L10:       R(N):=C2; 
L11  :       WFREe(N)  :=  WFREQCCl)  +  WFREe(C2)  ; 

$  Define  the  frequency  of  the  new 
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$  node  to  be  the  sum  of  the 

$  frequencies  of  its  descendants 


end  uhile  ; 


L12  :  CODE  :=  nl  ; 

L13  :  SEe  :=  NULB  ; 

Lm  :  WALK(  TOP  ==  arb  MORK  )  ; 

L15  :  print  CODE  ; 

proc  GETMIN(SET)  ; 

$  Get  the  node  with  minimal  frequency. 

L16  :  [  KEEP, LEAST  ]  :=  I  Y  :=  arb  SET  ,  WFRESCY)   1  ; 

L17  :  (  V  XeSET  ) 

L18  :       if  WFREeCXXLEAST  then  I  KEEP  ,  LEAST  J  :  =  I  X  ,  WFRES  (  X  )  ]  ; 

end  VX  ; 
L20  :  return  KEEP  ; 

end  proc  GETMIN  ; 


proc  WALK(T)  ; 


$  To  generate  code  for  each  node 


L21  :  if  L(T)  /=  on  then 

L22  :       SEe  I  I  FALSE  ;    $  FALSE  corresponds  to  left  path, 
L2  3  ••       WALK(L(T))  ; 

L24  :       SEQ  II  TRUE  ;     $  TRUE  corresponds  to  right  path, 
L25  :       WALK(R(T))  ; 
else 
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L26  :       CODE(T)  :=  SEQ  ; 
end  if  ; 

L27  :  SE2  :=  SE2( 1 : #SE2-1 )   ; 

end  proc  WALK  ; 

end  module  HUFFMAN  ; 

The  composite  objects  to  be  discussed  in  this  example 
include  two  sets  (WORK,  SET)  and  four  maps  (WFRE2,  L,  R, 
CODE).   In  phase  I,  we  introduce  bases  for  these 
variables . 

WORK  :  set(€B1)  ; 
SET  :  set(€B2)  ; 
WFREQ  :  smap(eB3)GB4  ; 
L  :  smap(eB5)eB6  ; 
R  •     smap(eB7)€B8  ; 
CODE  :  smap( eB9 )eB10  ; 

In  phase  II,  we  examine  the  set  and  map  operations 
appearing  in  this  algorithm.   The  information  contributed 
by  the  various  set  and  map  instructions  is  as  follows  : 


L2 


CHAR  is  identified  as  an  element  of  B1 


L3  •■       WF1  and  WF2  are  identified  as  elements  of  33  and 
B4  respectively. 

L6,L7  :   Since  the  pseudo  creation  point  of  both  Cl 
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and  C2  is  X  at  L17,  the  base  Bl  is  identified 
with  the  base  B2 . 

L8  :   N  is  identified  as  an  element  of  Bl.  and  a 
locate  instruction  is  emitted. 

L9,L10   :   The  pseudo  creation  point  of  N  is  the  N  at 
L8.   Since  a  locate  instruction  has  already  been 

generated  for  N  at  L8,  the  bases  B5  and  B7  are 
equivalenced  uith  B1.   The  pseudo  creation  point 
of  CI  and  C2  is  the  X  at  L17  and  hence  B6  and  B8 
are  equivalenced  with  B2. 

L11  :  B3  is  equivalenced  with  Bl. 

L16,L18  :  B3  is  equivalenced  with  B2. 

L21,L23,L25  =  B5  and  B7  are  equivalenced  with  Bl. 


L26  :  Since  the  pseudo  creation  point  of  T  is  the  TOP 
at  Lm,  B9  is  equivalenced  with  Bl.   Since  the 
pseudo  creation  points  of  SEQ  are  the  SE2  at  L13 
and  at  L27,  potential  locate  instructions  are 
generated  for  these  points. 

This  leads  us  to  realise  that  the  bases 
Bl ,B2, B3, B5,B6 ,B7 , B8  and  B9  are  all  equivalent. 
Provisional  locate  instructions  are  generated  at  Li,  L2 
and  at  L8,  at  which  the  insertion  of  a  new  atom  iiito  WORK 
also  forces  the  insertion  of  the  same  element  into  the 
base  Bl.   The  bases  B^    and  BIO  can  now  be  dropped  because 
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they  only  support  the  range  of  UIFREQ  and  CODE 
respectively.   The  range  modes  of  WFRE2  and  CODE  are  seen 
to  be  elementary  and  appear  as 

WFREe  ••     smap(  €B1  )real  ; 
CODE  :  smap( €B1 )bits  ; 

Phase  III  actually  inserts  the  locate  instructions  noted 
above.   Phase  IV  propagates  the  basings  assigned  to  the 
fundamental  objects  CHAR,  WFl,  WF2  and  N,  and  derives  the 
modes  of  all  other  variables,  yielding 

CI,  C2,  N,  TOP,  KEEP,  X,  T,  NODE  :  €B1  ; 
LEAST  :  real  ; 
SES  :  bits  ; 

Finally,  phase  V  decides  that  WFREQ,  L,  R,  and  CODE  should 
be  locally  based,  and  that  WORK  and  SET  should  be  sparse 
sets  since  SET  is  iterated  over  in  L17. 

The  data  structure  choice  resulting  from  our 
algorithm  can  be  summarized  by  an  equivalent  declaration 
as  follows  . 


B  :  base  ; 

WORK,  set  :  sparse  set(eB)  ; 
WFREe  :  local  smap(eB)real  ; 
L,  R  :  local  smap(eB)eB  ; 
CODE  :  local  smap(€B)  bits  ; 
CI,  C2,  N,  TOP,  KEEP,  X,  Y,  T,  NODE 
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eB  ; 


LEAST  :  real  ; 
SEe  :  bits  ; 


4  .  H  Example  ^4  •    Maximum  Flow 

Our  fourth  example  is  the  well-known  algorithm  for 
maximum  flou  through  a  netuorkf  justified  by  the  so-called 
max-flow  min-cut  theorem  : 

Given  a  network  defined  by  a  set  of  capacities  C(P»6), 
and  given  two  points  X  and  Y  in  the  network,  the  maximum 
value  which  any  flow  F  from  X  to  Y  can  have  is  at  the  same 
time  the  minmum  value  of  the  expression 

(+:  C(P,e),  PeS,2€S'  I 

where  S  ranges  over  all  sets  containing  X  but  not  Y,  and 
S'  is  the  complement  of  S . 

The  routine  takes  as  arguments  a  graph  defined  by  set 
of  pairs  called  GRB  and  a  real-valued  capacity  function 
FCAP  defined  for  iP.Sl^GRB,  and  two  distinct  nodes  X  and  Y 
Its  function  is  to  find  the  possible  maximum  flow  from 
X  to  Y.   For  an  earlier  version  of  this  routine  and 
further  discussion  concerning  it,  see  Schwartz (  1  97 3  )  , 
pp.  1  19-126  . 

module  MAXFLOW  ; 
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vars   GRB,  FCAP,  GRM,  F   end  vars  ; 

macro  START(RE)  ;  RE(2)(if  RE(1)  then  lelse  2)  ;   endm  ; 

macro  FINISH(RE)  ;  RE(2)(if  RE(1)  then  2  else  1)  ;  endm  ; 

LI  :   read  IGRB,  IFCAP,  X,  Y  ; 

L2  :    GRB   :=  {  lEGR  :=  [EG1:=EG(1),  EG2 : = EG ( 2  )  ]  =  EG e IGRB  }   ; 

$  graph 
L3  :    FCAP  :=  {  [EF1:=EF(1),  EF2:=EF(2)]:  EF€IFCAP  }   ;• 

$  capacity  function 
m  :   GRn:  =  {[E(  1)  ,  tTRUE.E  1  ]  :EeGRB)  +  {I  E(2)  ,  (FALSE,E  1  I --EeGRB}  ; 

$  Map  from  each  node  to  directed 

$  edges . 


L5  :   (  VEeGRB  ) 

L6  :        F(E)  : =  0  .  ;  ; 


$  flow  function 


L7  :   (  uhile  P:=PATH(X,Y)  /=  On  ) 
L8  :        AUXFLOWV  :=  ( min  :  REeP  ]  CAP(RE)  ; 
L9  :        (V  lTVAL,E]eP  ) 

L10:  F(E)  :=  F(E)  +  if  TVAL  then  AUXFLOWV  else  -AUXFLOWV  ; 

end  V  ; 
end  uhile  ; 

L 1  1  :  print  F  ; 

proc  CAP(RE)  ; 
L12  :  return  if  RE(1)  then  FCAP ( RE( 2 ) ) -F  (  RE  (  2  )  )  else  F(RE(2))  ; 
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end  proc  CAP  ; 

proc  PATH(X.Y)  ; 

L13  :  NEW  :=  {X}   ; 
LT4  :  SET  :=  {X}  ; 

L15  :  (while  NEW  /=  nl  doing  NEW  :=  NEWER  ) 

L16  :      NEWER  : =  nl  ;        $  new  nodes  to  be  processed 

L17  '  (  V  V€NEW  )  $  Look  for  next  plausible  node 

L18  :  (  V  RECGRtKV}   I  FINISH(RE)  ==  U  notin  SET 

and  CAP(RE)  >  0  ) 
L19  :  PRE(U)  :=  RE  ; 

L20  :  if  U=Y  then  quit  while  ; 

L2  1  :  SET  with  U  ; 

L22  :  NEWER  with  U  ; 

end  V  RE  ; 
end  V  V  ; 
end  while  ; 

L23  :  if  U/=Y  then  return  ON  ; 

L24  :  PTH  :=  nl  ; 
L25  :  PT  :=  Y  ; 

L26  :  (while  RE:=PRE(PT)  /=  OM  doing  PT : =START ( RE )  ) 
L27  :       PTH  with  RE  ;         *  Construct  the  path, 
end  while  ; 

L28  :  return  PTH  ; 
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end  proc  PATH  ; 

end  module  MAXFLOM  ; 

In  this  algorithm  there  occur  six  sets  (GRB,  P,  PTH, 

NEW,  NEWER,  SET),  four  maps  (GRM,  FCAP,  F  and  PRE),  and 

two  tuples  (E,  RE).   We  introduce  bases  for  them  as 
follous  : 


GRB     :  set(eBI) 

P       :  set(€B2) 

PTH    :  set(€B3) 

NEW     :  set(GB4) 

NEWER  :  set(eB5) 

SET    :  set(eB6) 

GRM    :  mmap(€B7 )set(eB8)  ; 

FCAP   :  smap(GB9 ) eBlO  ; 

F      :  smap( CB 1 1 ) eB12  ; 

PRE     :  smap(€B 1 3)eBl4  ; 

E  :  tuple(€B15,eBl6)  ; 

RE  :  tuple(€B17, €B18)  ; 


Noting  the  operations  at  L2,  L3,  L6,  L10,  L12,  L13,  LIM, 
L19,  L20,  L22,  L23,  L26  and  L27,  ue  equivalence  the  bases 
B1,  B9,  B11,  B18  the  bases  B2,  B3,  B8,  B14,  and  the  bases 
B4,  B5,  B6,  B7,  B13.   The  bases  BIO,  B12,  B15,  B16  and  B17 
are  found  to  be  useless  and  dropped.   The  basings  for 
composite  objects  then  become 


99 


GRB  :  set(eBl)  ; 

P,  PTH  :  set(eB2)  ; 

NEW,  NEWER,  set  ••  set(eB4)  ; 

GRM  :  mmap( eB4)set (eB2)  ; 

FCAP,  F  :  smapC €B4)real  ; 

PRE  :  smapC eB4 ) eB2  ; 

RE  :  tuple(bits, eBI )  ; 

Locate  instructions  are  inserted  at  LI  to  locate  X  and 

Y  into  B4,  at  L2  to  locate  EGR  into  Bl  and  to  locate  EGl 

and  EG2  into  B4,  at  L3  to  locate  EFl  into  B2,  and  at  L4  to 

locate  [TRUE, El  and  [FALSE, El  into  B2.   Since  each  EGR 
inserted  into  Bl  has  the  mode  [eB4,eB4),  the  element  mode 
of  Bl  is  found  to  be  [€B4,€B'4],  i.e.,  we  have 

' B  1  :base( [  eB4, eB4  ] )  ' .   Similarly,  the  mode  of  B2  is  found 

to  be  ' base ([ bits , €B 1  ])'  .   The  modes  of  other  variables 
are  then  determined  as  follows. 

X,  Y,  EGl,  EG2,  EFl,  U,  V,  PT  :  €B4  ; 
E,  EGR  :  €31  ; 
RE  :  €B2  ; 
AUXFLOWV  :  real  ; 
TVAL  :  bits  ; 

Note  that  the  tuple  RE  at  L18  is  assigned  the  member 
basing  'eB2'  in  addition  to  the  basing  ' tuple ( bit , e b  1)  ' 
which  has  been  assigned  to  RE  during  phase  I.   The 
occurrence  of  RE  at  L19  will  use  the  domain  basing 
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representation  while  the  rest  of  the  occurrences  of  RE 
will  make  use  of  its  tuple  representation. 

Finally,  NEW,  NEWER,  P,  PTH  and  each  image  of  GRM  are 
given  sparse  structure  while  other  composite  objects  are 
given  local  structure.   The  final  basing  choice  can  then 
be  summarized  as  follows  : 

B4  :  base  ; 

B1  :  base( [ €Bf, eB4 ] )  ; 

B2  :  base ( I  bits, eBI  ) )  ; 

GRB  :  local  set(eBI)  ; 

P,  PTH  :  sparse  set(€B2)  ; 

NEW,  NEWER  :  sparse  set(eB4)  ; 

SET  :  local  set(eB4)  ; 

GRM  :  local  mmap(eB4)  sparse  set(€B2)  ; 

FCAP,  F  :  local  smap ( €B4 ) real  ; 

PRE  :  local  smap(€B4)€B2  ; 

RE  :  tuple(bits, €B1 ) ,  eB2  ; 

X,  Y,  EG1,  EG2,  EF1,  U,  V,  PT  :  €B4  ; 

E,  EGR  :  €B1  ; 

AUXFLOWV  :  real  ; 

TVAL  :  bits  ; 


4  . 5  Example  5  =  Interval  Analysis 


As  a  final  example,  we  apply  our  algorithm  to  the 
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interval  analysis  program  dicussed  in  chapter  2,  to  see 
uhether  it  can  generate  a  basing  choice  that  is  compatible 
to  the  manual  basing  choice  described  earlier.   To  clarify 
the  action  of  our  algorithm  in  this  case,  ue  first  list 
certain  pseudo  creation  points  uhich  play  important  roles 
during  the  processing- 

ENTRY  has  its  pseudo  creation  point  at  L02, 

ENRYIKT  has  its  pseudo  creation  point  at  L09, 

J  has  its  pseudo  creation  points  (occurrences  of  IHTS) 

at  L12  and  L17, 
I  has  its  pseudo  creation  point  at  L10, 
NODE  has  its  pseudo  creation  point  at  L15, 
Z  has  its  pseudo  creation  point  at  L27, 
U  has  its  pseudo  creation  point  at  L30. 

Now  let  us  proceed  to  track  the  action  of  our  data 
structure  choice  algorithm.   In  phase  I  and  II,  ue 
introduce  bases  for  composite  objects  and  equivalence  the 
bases  by  eKamining  set,  map  and  tuple  operations  and  the 
pseudo  creation  points  of  their  arguments.   After  phase 
II,  the  resulting  basing  choice  is  as  f ollous . 


NODES,  INTS,  FOLLOWERS,  SEEN,  HEADS,  NEWIN 

INTOV  :  smap(eB)€B  ; 

CESOR  :  smap(eB)set(€B)  ; 

FOLLOW  :  smap ( € B ) se t ( e B )  ; 

NPREDS  :  smap(€B)int  ; 

COUNT  ••  smap(€B)int  ; 
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set(eB)  ; 


J  :  tuple(€B)  ; 

SEB  :  tuple(set(€B) , EB)  ; 

Phase  III  inserts  locate  instructions  at  Li  and  L2  for  the 
elements  of  CESOR  and  ENTRY.    Also,  a  locate  operation  is 
inserted  at  L17  uhen  J  receives  a  value  from  the  routine 
INTERVAL.   Phase  IV  then  propagates  the  member  basings 
derived  from  such  locate  operations  to  other  variables, 
and  yields  the  follouings  ' 

ENTRY,  ENTRYINT,  I,  NODE,  ND,  X,     Y,  Z,  J  :  €B  ; 

At  this  point  the  tuple  J  which  has  been  assigned  the  mode 
'tuple(€B)'  is  also  assigned  the  member  basing  'eB'.   The 
analysis  performed  by  phase  IV  indicates  that  the  member 
basing  is  useful  for  all  the  occurrences  of  J  except  the 
occurrence  at  L19  which  needs  domain  basing. 

The  final  phase  then  determines  representation 
attributes  in  the  manner  already  described,  leading  to  the 
following  overall  basing  declaration  ■ 

B  :  base  ; 

NODES,  INTS,  FOLLOWERS,  NEWIN  :  sparse  set(€B)  ; 

SEEN,  HEADS  :  remote  set(eB)  ; 

INTOV  :  local  smap(€B)eB  ; 

CESOR,  FOLLOW  :  local  smap (€ B ) sparse  set(€B)  ; 

NPREDS,  COUNT  :  local  smap(€:B)int  ; 

J  :  tuple(€B) ,  eB  ; 
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SES  :  tuple( tuple(sparse  set(GB),eB))  ; 

Encouragingly,  the  result  ue  have  just  derived  exactly 
matches  the  manual  choice  which  was  presented  in  chapter  2 
with  B  corresponding  to  ALLNODES. 
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CHAPTER  5  :  CONCLUSIONS  AND  FUTURE  RESEARCH  DIRECTIONS 


Basings  incorporate  pointer  and  indexing  mechanisms 
and  reflect  relations  between  the  objects  appearing  in  an 
program.   Using  this  notion,  ue  have  explored  a 
demonstration  system  which  we  expect  will  be  capable  of 
automating  significant  aspects  of  the  data  structure 
choice  process.   Judging  from  various  test  examples,  some 
of  which  have  been  presented  in  the  preceding  chapter,  our 
system  should  perform  well  ;  in  general  it  seems  to 
produce  a  highly  acceptable  basing  choice. 

However,  the  system  we  have  described  is  far  from 
complete.   It  utilizes  only  a  small  set  of  representation 
structures.   It  incorporates  certain  systemised  heuristics 
drawn  from  manual  exploration  but  does  not  use  other  more 
sophisticated  data  structuring  techniques.   Much  more  will 
need  to  be  done  in  mastering  the  complicated  problem  of 
automatic  data  structure  choice. 

Neverthless,  we  believe  that  the  concept  and  the 
system  presented  in  this  thesis  have  realized  a  first 
essential  step  in  automating  the  data  structuring  process. 
Further  improvement  can  certainly  be  achieved  by  pushing 
this  approach  further.   We  shall  now  list  some  of  the  idea 
that  have  occurred  to  us  during  our  research  in  this  area. 
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as  possible  research  topics  for  the  future 


5 . 1  Merging  Rule 

The  base  'merging  rule'  described  above  is  one  of  the 
crucial  parts  of  our  system.   It  has  been  over-simplified 
;  suggested  basings  of  a  variable  occurrence  are  merged 
and  identified  unless  they  are  radically  different. 
Several  possible  improvements  of  basing  choice  algorithm 
can  be  realised  by  modifying  the  merging  rule. 

5.1.1  Parallel  Member  Basings, 

When  more  than  one  member  basing  is  suggested  for  an 
©variable  it  may  not  be  necessary  to  identify  these 
basings.   Rather,  it  may  be  desirable  to  keep  more  than 
one  basing  with  the  vairable  and  to  choose  the  most 
advantageous  basing  for  use  at  each  appearance  of  the 
variable.   In  particular,  this  is  useful  in  the  case  of 
tuo  composite  objects  initialized  with  the  same  element 
which  then  grow  seperately  and  disjointly.   For  example, 

51  :=  {X}  ; 

52  :=  {X}  ; 
(V0<Y<10)  SI  with  Y  ; 
(V10<Y<20)  S2  with  Y  ; 

The  current  system  would  give  Si  and  S2  the  same  basing  ; 
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this  is  certainly  a  poor  choice  because  X  is  the  only 
element  which  is  in  both  SI  and  S2  so  that  a  common  base 
for  them  would  be  used  sparsely.   A  better  choice  is  to 
let  X  carry  two  different  basings  and  let  SI  and  S2  based 
on  different  objects. 

5.1.2  Small  Object  Transmission 

If  we  replace  the  first  two  instructions  of  the 
preceding  example  by  an  equivalent  pair  of  instructions 

51  :=  {X}  ; 

52  : =  SI   J 

a  different  consideration  arises.   In  this  case,  our 
present  automatic  structure  choice  system  again  proceeds 
to  identify  the  bases  of  Si  and  S2.   However,  this  is  not 
what  we  would  like  to  have,  since  SI  and  S2  overlap  on  X 
only,   A  possible  solution  to  this  problem  would  be  to 
treat  the  assignment  instruction  'S2:=S1'  as  a  potential 
point  of  conversion,  in  other  words,  to  treat  it  as  if  it 
read 


S2 


{Y,Yesl}  ; 


which  would  make  it  a  value  creation  operation.   In  this 
case,  the  bases  of  SI  and  S2  would  not  be  identified. 

However,  in  an  approach  like  this  it  is  not  clear  when 
we  should  treat  an  assignment  instruction  as  a  potential 
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conversion  and  perform  such  a  transformation  .   A 
reasonable  heuristic  rule  would  be  to  perform  this 
transformation  when  the  value  to  be  transmitted  is  known 
to  be  a  'small  object'.   By  definition,  a  small  object  is 
a  null  set  or  a  composite  object  which  is  created  by  a  set 
former  consisting  of  a  finite  number  of  explicitly  listed 
elements.   A  variable  is  a  small-object  variable  if  its 
value  is  knouin  to  be  a  small-object  value.   This 
definition  allous  'small-object'  to  be  regarded  as  a 
static  attribute  of  program  variables  such  that  a  standard 
attribute  propagation  algorithm  can  be  applied  to  detect 
these  cases . 

5 . 2  Conversion  of  Representation  Structure 

Domain  based  objects  are  given  unique  representations 
by  the  current  basing  selection  system.   On  the  other 
handf  it  is  clear  from  examples  that  appropriately 
inserted  representation  conversion  can  achieve  more 
efficient  execution.   An  important  case  is  that  in  which 
the  uses  of  a  variable  in  different  regions  of  a  program 
suggest  different  representation  attributes.   It  might  be 
profitable  to  convert  variable  representations  at 
'bottlenecks'  between  regions  within  which  different 
representation  attributes  are  suggested. 

5 .  3  Conversion  of  Basings 
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This  is  an  even  more  complicated  issue  than  conversion 
of  representation  structure.   It  can  be  advantageous  to 
convert  the  basing  of  a  based  object,  e.g.,  from  being 
based  on  one  base  to  another  base,  or  from  being  based  to 
being  unbased.   It  is,  however,  unclear  when  this  kind  of 
conversion  will  be  most  profitable  and  hou  to  detect 
situations  in  which  this  kind  of  conversion  is  profitable 
at  all. 

5 . 4  Conversion  Of  Sparse  objects 


We  treat  based  objects  over  which  iterations  are 
executed  as  potential  sparse  sets.   Another  possible 
solution  to  the  problem  of  how  to  handle  iterations 
efficiently  is  to  convert  the  based  set  to  have  a  list 
structure  before  it  is  iterated  over.   Such  a  scheme  can 
be  profitable  if  the  necessary  conversion  can  be  moved  out 
of  frequently  executed  loops.   After  conversion  is 
performed,  two  different  representation  structures  of  the 
same  value  exist,  and  the  list  structure  can  support  the 
iteration  efficiently.   However,  if  a  value  being  iterated 
over  is  modified  after  conversion  but  before  iteration, 
both  representations  of  the  value  must  be  updated,  adding 
to  the  expense  of  the  scheme.   A  reasonable  compromise 
might  be  as  follows.   A  conditional  conversion  is  inserted 
at  the  last  point  preceding  the  iteration  over  an  object 
which  modifies  the  object.   The  density  of  the  object  (the 
cardinality  of  the  object  and  its  base)  is  examined.   If 
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the  density  is  less  than  certain  value  (i.e.,  if  the 
object  is  sparse),  a  conversion  is  carried  out.   Since  the 
object  is  not  subject  to  any  modification  before  it  is 
iterated  over,  updating  of  the  list  created  to  support 
iteration  uill  not  be  required. 

5 . 5  Multi-level  Basings 

The  basing  system  allows  a  declared  base  U  to  be  based 
on  another  base  V.   In  such  case,  the  base  U  is  called  an 
intermediate  base,  and  the  associated  relational  structure 
is  said  to  involve  multi-level  basings.   Conversely,  if 
there  is  no  intermediate  base  in  a  relational  structure  it 
is  said  to  involve  only  simple  basings. 

While  our  system  can  only  generate  simple  basings, 
multi-level  basings  can  be  useful,  particularly  for  sparse 
sets.   The  introduction  of  an  intermediate  base  B1  (a  base 
of  elements  of  another  btise  B)  can  make  a  based  set  uhich 
is  sparse  in  the  ground  base  B  be  dense  on  Bl.   Change  of 
the  density  of  based  sets  can  significantly  improve  the 
efficiency  of  algebraic  operations  as  well  as  iteration 
over  based  sets.   However,  extra  locate  operations  are 
required  whenever  an  element  of  the  intermediate  base 
references  the  ground  base  for  the  first  time  (after  the 
first  such  reference,  a  proper  basing  pointer  can  be  kept 
with  the  clement  in  the  intermediate  base).   The  cost  of 
element  block  allocation  for  the  intermediate  base  can 
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also  increase  significantly. 

5  ,  6  Co-linked  Bases 

Use  of  co-linked  bases  may  be  regarded  as  a  variation 
of  multi-level  basing.   Two  bases  are  co-linked  if  each 
one  is  declared  to  be  a  base  of  elements  of  the  other.   An 
example  is  given  by  the  legal  declaration 

repr   B1  :  base(€B2)  ;   B2  :  base(eBl)  ;   end  repr  ; 

After  this  declaration  each  element  of  B1  keeps  a  pointer 
to  the  corresponding  element  of  B2  and  vice  versa.   A 
basing  pointer  from  an  element  XI  in  Bl  to  the 
corresponding  element  X2  in  B2  is  established  at  the  first 
reference  from  XI  to  X2  and  vice  versa.   After  such  a 
pointer  is  established  succeeding  references  from  XI  to  X2 
need  no  additional  locate  operations.   If  XI  never 
references  the  corresponding  element  of  B2,  a  pointer  from 
XI  of  Bl  to  the  corresponding  element  of  B2  need  never  be 
made  available.   Clearly,  co-linked  basing  is  more  general 
than  strict  multi-level  basing  in  the  sense  that  either 
base  can  be  regarded  as  an  intermediate  base  of  the  other 
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CHAPTER  6  :  SETL  CODE  FOR  THE  DATA  STRUCTURE  CHOICE  ALGORITHM 


module  AUTO-DSTRUCT  ; 


$      Our  data  structure  choice  algorithm  has  been  designed 

$  to  be  compatible  with  the  currently  implemented  SETL 

$  optimizer,  i.e.,  analytic  information  derived  by  the 

$  SETL  optimizer  can  directly  be  used  by  our  algorithm. 

$  For  a  detailed  account  of  the  SETL  optimizer,  see 

$  Grand[1978].   The  terms,  variable  names  and  data 

$  structures  used  in  tlie  SETL  optimizer  are  inherited  by  our 

$  algorithm.   Some  of  the  utility  macros  and  subroutines 

$  defined  in  the  SETL  optimizer  are  also  used  by  our 

$  algorithm  without  modification. 

*  For  this  reason,  ue  shall  nou  summarize  the 

$  definitions,  constructs  and  ouputs  of  the  SETL  optimizer 
$  which  are  relevant  to  our  subsequent  discus   ->n . 

*  Symbols 

*  Each  symbol  corresponds  to  a  resolved  name  in  a  SETL 
$  source  program  or  to  a  compiler  generated  temporary. 

$  Symbols  are  represented  as  atoms  which  are  elements  of 
$  the  base  SYMBOLS. 

$  The  Symbol  Table 
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$      The  'symbol  table'  is  a  collection  of  maps  on  SYMBOLS. 
$  These  maps  are: 


vars 

NAME, 

VALUE, 

IS-CONST. 

IS-BASE, 

IS-GLOBE 

end  vars  ; 


$  name  of  symbol 

$  value  of  symbol 

$  indicates  constant 

$  indicates  base 

$  indicates  global  variable 


$  Procrr am 

$      A  program  is  divided  into  routines,  basic  blocks,  and 
$  instructions.   Each  instruction  consists  of  an  opcode  and 
*  a  tuple  of  arguments.   All  the  inputs  and  outputs  of  an 
$  instruction  appear  eKplicitly  as  arguments. 

$      The  instructions  in  each  block  are  threaded  into  a 
$  linked  list.   This  is  designed  to  allow  maximum 
$  flexibility  in  code  insertion  and  deletion. 

$  Maps  on  Instructions 


vars 


NEXT,  $  next  instruction  in  block 

BLOCKOF,  4  gives  block  containing  instruction 

OPCODE,  $  operation  code 

ARCS,  *  tuple  of  arguments 
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COPY^FLAG     $  indicates  what  copy  action  should 
$  be  done 
end  vars  ; 


*  Macros  for  Accessing  Fields  uithin  Instructions 

(Ml):  macro  ARGl(I);  ARGS(I)(1)  endm;  $  ISt  argument 
(M2):  macro  ARG2(I);  ARGS(I)(2)  endm;  $  2nd  argument 
(M3):    macro  ARG3(I);   ARGS(I)(3)    endm;   $  3rd  argument 

$  Iteration  over  a  Procrram 

$      The  following  macro  is  used  to  iterate  over  the 
$  instructions  in  a  block. 

(M4):    macro  FORALLCODE ( B ,  I); 

init  I:=FIRST(B);  while  I/=OM  step  I:=NEXT(I); 
endm; 

$  Iteration  over  the  whole  program  is  written  : 

*  (V  B  e  BLOCKS,  FORALLCODECB,  I)) 

$  Occurrences 


$      An  occurrence  is  a  use  or  definition  of  a  variable. 
$  It  is  defined  to  be  a  pair 

$      [instruction  identifier,  argument  number]. 

S  Occurences  which  are  inputs  are  called  ' ivar iables ' ,  and 
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$  occurrences  which  are  outputs  are  called  ' ovariables ' .   An 
$  occurrence  may  be  both  an  i-  and  o-variable,  for  example' 
$  'F'  in  • F(X)  :=  Y' . 

$      With  the  exception  of  the  'from'  operator,  ovariables 
$  are  always  the  first  argument  of  their  instruction. 
$  Ivariables  may  appear  in  any  argument  position. 

$      The  'from'  operator  has  two  arguments,  both  of  which 
$  are  inputs  and  outputs . 

$      In  order  to  speed  up  iterations  and  test  the  types  of 
$  occurrences  we  provide  the  following  sets  and  macros  • 

vars 

ALL-'OI,  $  set  of  all  occurrences 

ALL-'O,  $  set  of  all  ovariables 

ALL-I,  $  set  of  all  ivariables 

ALL-'VARS  $  set  of  all  entries  in  symbol  table 
end  vars  ; 


$  The  following  macros  are  used  in  connection  with 
$  occurrences  : 

(M5):    macro  INSTNO(OI);  01(1)  endm; 

$  the  instruction  which  contains  01 

(MS):    macro  ARGNO(OI);  01(2)  endm; 

$  the  argument  number  of  01 
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(M7):    macro  OI^OP(OI);  0PC0DE(0I(1))        endm; 

$  the  operation  code  of  the  instruction 
$  containing  01 

(M8):    macro  OI-HAMECOI);  ARGS(0I(1))  (01(2))  endm; 

*  the  variable  name  of  01 


(M9):    macro  OI-VALUE ( 01 )  ; 

$  the  value  of  01 


VALUECOI-NAMECOI)  )   endm; 


(mO):    macro  OI-INTOV(OI)  ; 

INTOV(BLOCKOF(INSTNO(OI) ) )     endm  ; 
$  the  interval  uhich  contains  01 

(Mil):    macro  IS-OVAR(OI);     (01  in  ALL-0)   endm; 

$  indicates  whether  01  is  an  ovariable 

(M12):    macro  IS-IVAR(OI);    (01  in  ALL-I)   endm; 

$  indicates  whether  01  is  an  ivariable 

(m3):    macro  IS-HASHED  ( 01 )  ;   OI-'OP(OI)  in  OPS-HASH   endm; 
$  indicates  whether  01  is  subject  to  an 
S  operation  involving  hashing 

(ni4):    macro  IFROMOCO,  N);   (0(1),  K+ 1  I   endm  ; 

$  the  N-th  ivariable  of  the  instruction 
$  containing  0  as  the  ovairalbe 

(M15):    macro  OFROMKI);   [1(1),  11   endm  ; 

$  the  ovariable  of  the  instruction  containing 
*  the  ivariable  I 
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$  Opcodes 

$      The  set  OPCODES  defines  all  the  operations  in  the 

$  internal  program  representation.   The  follouing  lists 
$  the  opcodes  relevant  to  the  subsequent  discussion. 

const  OPCODES  := 

{ 

$  Binary  operators 

ei-ADD,  $   + 

ei-DIV,  $   / 

ei-EXP,  $   ** 

Q1-E2,  $  eq 

21-inP,  $  imp 

ei-IN,  $  in 

ei-INCS,  $  incs 

e  1  -'less  ,  $  less 

BI-lessF,  $  Hessf 

ei-noD,  $  // 

ei-nuLT,  $  * 

ei-HE,  $  ne 

21-notin,  $  notin 

ei-NPOU,  $  npoM(n,set) 

BI-SUB,  $ 
ei-SUBSET,    $  subset 

Ql-uith,  $  with 

$  Unary  operators 
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21-UMIN,      $  unary  minus 


$  Miscellaneous 


ei^SET,  $  set  former 

21-SET1,  $  set  formed  uith  loop 

21-TUP,  $  tuple  former 

21-TUPl,  $  tuple  formed  uith  loop 

S1-FR0M,  $  A1  from  A2; 


$  Iterators 


ei-NEXT,  $  M  :=  next  element  of  A2 

21-NEXTD,  $  A1  :=  next  element  of  domain  A2 

21-INEXT,  *  initialize  next  loop 

ei-INEXTD,  $  initialize  nextd  loop 

$  Mappings 

ei-OF,  $  Al  : =  A2(A3) 

ei^OFA,  $  A1  =  A2(A3} 

Ql-OFB,  $  A1  =  A21 A3  1 

B1-S0F,  $  arg  1  ( arg2 ) =arg3 

B1-S0FA,  $  arg 1 {arg21 =arg3 

ei-SOFB,  $  A1  [ A2  1  :  =  A3; 

$  Assignments  -  all  assign  arg2  to  argi 

21-'ARGIN,  $  assign  argumen^  to  formal  parameter 

21^ARG0UT,  $  return  value  from  a  function 

ei-ASN,  $  argi  —  arg2 
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ei-pusH, 
ei-pop, 

}; 


$  push  element  for  set  former 

$  pop  read  only  argument  from  stack 


end  const; 

$  The  opcodes  are  divided  into  several  categories  which  are 
$  represented  as  constant  sets.   These  sets  are  used  to 
$  drive  'case'  statements  and  as  predicates  on  OPCODES. 
$  The  classes  useful  in  our  subsequent  discussion  include 

const 

(CI)  5    OPS-'ASN  :=    $  assignment  operators 

{  el-ASN,  ei-ARGIN,  ei-ARGOUT,  fil-PUSH,  gl-POP  }  ; 

(C2):    OPS-HASH  •=         $  operations  involving  hashing 
{  ei-WITH,  ei-LESS.  el-FROM,  Q1-0F,  ei-OFA  1  ; 

(C3):    OPS-RETRIEVE  '•  =  $  value  retrieval  operators 

{  ei-ARB,  QI-FROM,  SI-NEXT,  BI-INEXT,  ei-NEXTD, 

ei-NEXTD,  ei-INEXTD,  el-OF,  S1-0FA,  ei-OFB  }  ; 

(C4):    OPS-CREATE  :=   $  value  creation  operators 

{  ei-ADD,  21-DIV,  ei-EXP,  Q1-LESS,  21-LESSF,  Ql-MOD, 
21-MULT,  21-NPOW,  21-SUB,  21-WITH,  fil-UMIN,  QI-SET, 
21-SET1,  21-TUP,  21-TUP1  ,21-SOF,  21-SOFA, 
21-SOFB  1  ; 
end  const; 
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$  RC-strinqs 

$      In  order  to  make  inter procedural  analysis  precise,  ue 
$  allow  the  attributes  of  a  variable  occurrence  01  to  vary 
*  depending  upon  hou  the  routine  P  in  which  01  appears  is 
$  invoked.   For  example,  X  may  be  of  type  'set'  when  P  is 
$  called  form  an  instruction  A,  but  of  type  'tuple'  when  P 
$  is  called  from  another  instruction  B.   For  this  reason 
$  most  of  the  attribute  maps  (e.g.,  type)  used  within  the 
$  SETL  optimizer  are  defined  on  pairs  of  [OI,RCj  where  01  is 
$  a  variable  occurrence  and  RC  is  a  so-called  RC-string 
$  (R-eturn  C-all  string),  instead  of  being  defined  simply 
$  on  variable  occurrences.   Logically,  a  RC-string  is  the 
$  concatenation  of  a  series  of  return-call  phrases. 


* 


Each  return-call  phrase  is  a  pair 


$ 


tXXX,  INSTl 


$  where  XXX  is  one  of  the  constants  RC-CALL  or  RC-RETN,  and 
$  INST  is  an  instruction  identifier.   Intutively, 
$  [RC-CALL,!]  means  'by  way  of  CALL  at  instruction  I'  and 
$  IRC-'RETH,!]  means  'by  way  of  return  to  instruction  I*. 
$  An  entire  RC  string  is  represented  as  a  tuple  of  these 
$  pairs. 


$      Technically,  when  we  analyze  an  attrbibute  of  an 
$  occurrence  01  we  store  it  as  a  pair  (or  set  of  pairs) 
$  IRCS,ATTI  where  ATT  is  the  attribute  and  RCS  is  the 
S  return-call  path  along  which  the  attribute  was  created 
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$  When  ue  propagate  an  attribute  (RCS.ATTi  from  an 

$  occurrence  01  to  some  other  occurrence  I  later  in  the 

$  program,  we  must  begin  by  finding  the  return-call  path 

$  RCS 1  which  takes  us  from  01  to  I.   We  then  set  the 

S  attribute  of  I  to  the  pair  [ ATT , RCS I  I RCS 1  1  if  this 

S  concatenation  yields  a  valid  RC-string  and  to  Oti 

$  otherwise . 

$      The  operator  'A  cc,  B'  returns  the  concatenation  of 
$  two  RC-strings  A  and  B  j._  ^  -  result  is  a  valid  string 
$  and  returns  the  constant  ERROR--PATH  otherwise.   For 
$  further  detail  about  RC-strings,  see  Grand  et  all  1978  1. 

$  The  following  constants  are  used  for  RC-strings  = 

const 
(C5):      RC-CALL;    $  indicates  CALL 
(C6):      RC-RETN;    $  indicates  return 
(07):     HULL-PATH  ==  [   I;   $  null  return  CALL  path 
(C8):     ERROR-PATH;    $  error  path 

end  const; 

$  Chaining  Of  occurrences 

$      Certain  central  algorithms  in  the  SETL  optimizer  are 

S  designed  to  build  up  a  set  of  pairs  [01,1  I  where  01  and 

S  I  are  occurrences  and  there  is  a  path  with  certain 

$  properties  from  one  to  the  other.   We  say  that  these 

$  sets  'link'  or  'chain'  occurrences  with  certain 
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S  properties. 

$      When  ue  link  tuo  occurrences  in  different  procedures, 
$  ue  keep  track  of  the  return-call  path  by  which  they  are 
$  linked. 

$      The  sets  built  by  the  various  chaining  algorithms 
$  always  have  elements  of  the  form: 


$ 


[01,  [P,  IJ  ] 


$  where  01  and  I  are  occurrences  linked  together  along  the 
$  return-  call  path  P. 


$ 


Three  of  the  most  important  link  maps  are 


$  BFRond}  ; 

$      If  I  is  an  occurrence  of  some  variable  V  then  BFROMd} 
$  is  a  set  of  pairs  [P,OIJ  where  P  is  a  return-call  path 
$  and  01  is  an  occurrence  of  V  such  that  there  is  a 
$  V-clear  path  along  P  from  01  to  I. 

$  FFROMfll  : 

$      This  is  essentially  the  inverse  of  BFROM.   If  I  is  an 
S  occurrence  of  V  then  FFROM{I}  is  a  set  of  pairs  IP, Oil 
$  where  P  is  a  return-call  path  and  01  is  an  occurrence  of 
$  V  such  that  there  is  a  V-clear  path  along  P  from  I  to 
$  01. 

$  PS-CRTHISCI}   : 
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$      This  is  the  value  flou  map  used  in  our  data  structure 
$  choice  algorithm.   If  I  is  a  variable  occurrence  then 
$  PS-CRTHIS{I}  is  a  set  of  pairs  [P,OIl  where  P  is  a 
$  return-call  path  and  01  is  the  ovariable  occurrence  of  a 
$  value  creation  or  value  retrieval  instruction  such  that 
$  the  value  of  01  can  be  transmitted  to  I  through  simple 

*  assignments  (along  the  path  P). 

$  Type  Finding 

$      The  SETL  optimzer  uses  a  modified  version  of  the 

$  Tennenbaura's  type  finder.   This  type  finding  algorithm 

$  is  interprocedural  in  nature.   The  type  information  it 

$  develops  gives  us  a  first  approximation  to  the 

$  representation  structure  of  each  occurrence. 

*  The  set  of  basic  types  form  a  Boolean  lattice.  A 
$  point  on  this  lattice  is  referred  to  as  a  gross  type. 
$  Intuitively  the  gross  type  of  an  object  gives  us 

$  information  about  its  top  level  structure. 

*  The  type  lattice  is  defined  in  terms  of  a  set  of 
$  nodes,  namely  the  gross  types,  and  a  MEET  and  JOIN 

$  function.   The  gross  types  are  represented  as  sets  of 

*  atoms,  and  MEET  and  JOIN  are  represented  as  set  union 
$  and  intersection. 


const 
(C9):     TOM 


=  {  NEWAT  } ;   $  on 
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(CIO) : 
(  C  1  1  )  •• 
(  C  1  2  )  •• 
(C13) : 
(C14)  : 
(C15) : 
(  C  1  6  )  : 
(C17) : 
(C18)  : 
(C19) : 
(C20)  •• 
(C21  )  : 
(C22)  : 


TSI 

TLI 

TR 

TSC 

TLC 

TA 

TL 

TP 

KNT 

UNT 

GMAP 

GSET 

TELMT 


NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT 
NEWAT} 


$  short  integer 

$  long  integer 

$  real 

$  short  chars 

$  long  chars(C15)'- 

$  atom 

$  label 

$  procedure 

$  known  length  tuple 

$  unknown  length  tuple 

$  map 

$  set 

$  element 


end  const; 


$  The  following  points  in  the  type  lattice  are  also  given 
$  names : 


const 
(C23) : 
(C24) : 
(C25) : 
(C26) : 
(C27)  : 
(C28)  : 
(C29) : 
(C30) : 


TC  : 
TI  : 
TNUM  : 
TMTUP 
THTUP 
TTUP  : 
TSET  : 
TMAP  : 


■  TSC  +  TLC; 
:  TSI  +  TLI; 
^  TI   +  TR; 

=  KNT; 

=  UNT; 

■■    KNT  +  UNT; 
:  GSET; 
=  GMAP; 


$  characters 

$  integers 

$  numbers 

$  tuples  of  know  length 

$  tuples  of  unknown  length 

$  tuples 

$  sets 

$  maps 


(C31):    TG  ••  =  TOM  +  TA  +  TNUM  +  TC  +  TTUP  +  TSET  +  TMAP; 
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(C32) 


TZ  :=  nl; 


$  zero  element 


(C33):    TSTRUCT  :=  TTUP  +  TSET  +  TMRP;   $  any  structured  type 
(C34):    TZSTRUCT:=  TG  -  TSTRUCT;   $  zero  for  sublattice  of 

$  structures 


(C35):    MAPTUP 
(C36  )  ••    SETTUP 
(C37) :     MAPSET 
end  const; 


=  THAP  +  TTUP;  $  nap  or  tuple 
=  TSET  +  TTUP;  *•  set  or  tuple 
=  THAP  +  TSET;    S  map  or  set 


$  The  following  macros  are  used  to  access  type  lattice 
$  elements . 


(M16) 


(M17) 


macro  STRUCTPART ( G ) ;    G  *  TSTRUCT 
$  struture  of  type  G 


endm ; 


macro  IS-PRIM(G);    STRUCTPART ( G )  =  TZ     endm; 
$  indicate  whether  G  is  a  primitive  type 


$      Note  that  the  following  two  criteria  are  used  to 

$  regulate  the  degree  to  which  minor  type  ambiguities  can 

$  impact  our  data  stucture  choice  algorithm. 

*  (1)  We  assume  that  the  object  S  appearing  in  an  instruction 
$  'S  with  X*  will  be  assigned  the  type  TMAP  because  a  map 
$      cannot  be  defined  on  OM  and  the  current  type  finder  is 

*  unable  to  tell  that  whether  the  first  component  of  X  (if 
$      it  is  a  tuple)  is  OM . 

$  (2)  We  assume  that  the  object  T  appearing  in  'T(X)'  will  be 
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$  assigned  the  type  TMTUP  only  if  the  value  of  X  is  knoun 

$  at  the  compile  time  (i.e.,  OI-VALUE(X)  is  defined), 

$  otherwise  T  should  be  assigned  the  type  THTUP  if  it  is 

$  a  tuple . 

$  Type  Descriptors 

$      A  type  descriptor  is  a  complete  description  of  an 

$  object's  type.   If  an  object  is  primitive,  it  is 

$  described  by  a  pair  [ GROSSTYP , On 1  uhere  GROSSTYP  is  an 

$  element  of  the  type  lattice  indicating  its  gross  type. 

$  If  an  object  is  structured,  its  type  is  described  as  a 

$  pair  [ GROSSTYP, COMPTYP  1  uhere  GROSSTYP  is  again  an  element 

$  of  the  type  lattice  and  COMPTYP  is  a  type  descriptor  for 

$  the  components  of  the  object. 

*  The  use  of  COMPTYP  varies  slightly  for  each  type  of 
S  structured  object. 

$  A.  Sets 

$     GROSSTYP:  TSET 

*  COMPTYP:   type  descriptor  for  elements 

$  B.  Homogeneous  tuples  of  unknown  length 

*  GROSSTYP:  THTUP 

$     COMPTYP:   type  descriptor  for  components 

$  C.  Mixed  tuple  of  knoun  length 
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*     GROSSTYP:  TMTUP 

$     COMPTYP:   tuple  of  type  descriptors  for  components. 


If  T  is  a  type  descriptor  for  a  known  length 
tuple,  then  CTYPH(T,N)  is  the  type  descriptor 
for  the  N-th  component  of  T,  and  LENTYP  is  the 
length  of  the  tuple  of  type  descriptors,  or 
equivalently ,  the  length  of  the  tuple. 


* 
$ 

$ 

$ 

$  D.  Map 

*  GROSSTYP:  TMAP 

$  COMPTYP:   type  descriptor  for  the  element  type  of  the  map, 
$  namely  knoun  tuple  of  length  2.   If  T  is  a  type 

$  descriptor  for  a  map,  then  DOMTYP(T)  is  a  type 

*  descriptor  for  the  domain  of  T,  and  RANTYP(T) 

*  is  a  type  descriptor  for  the  range  (i.e.,  F(X)) 

*  of  T. 

*  The  output  of  the  type  finder  is  a  map  called  TYPES. 
$  If  01  is  an  occurrence  and  P  is  a  return-call  path,  then 
$  TYPES(OI,P)  is  a  type  descriptor  giving  the  type  of  01, 

$  assuming  that  the  program  has  proceeded  along  the 
$  return-call  path  P. 

$  The  following  macros  are  used  for  type  descriptors  : 

(MIS):    macro  GROSSTYP(T);       T(1)  endm; 

$  gross  type  of  type  descriptor  T 
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(M19):    macro  COMPTYP(T);        T(2)  endm; 

$  type  of  elements  or  components  of  T 

(nZO):    macro  CTYPNCT,  N);      COMPTYP ( T ) ( N )     endm; 
$  type  of  N-th  component  of  T 

(M21):    macro  LENTYP(T);         (#  COMPTYP(T))    endm; 
$  length  of  type  descriptor  T 

(M22):    macro  DOriTYP(T);   CTYPNC  COMPTYP  ( T )  ,  1)   endm; 
$  domain  type  of  type  descriptor  T 

(M23):    macro  RflNTYP(T);   CTYPH( COMPTYP ( T ) ,  2)   endm; 
$  range  type  of  type  descriptor  T 

$  Automatic  Data  Structure  Choice 

$       Our  data  structure  choice  algorithm  utlises  the 
$  information  derived  by  the  SETL  optimizer  to  determine 
i    the  basing  mode  of  variable  occurrences.   The  inputs  to 
S  our  algorithm  are  : 

$       1 .  The  data  flou  maps  BFROM  and  FFROM,  and  the  value 

*  flow  map  PS-CRTHIS. 

$       2.  The  type  map  TYPES  uhich  gives  the  possible  types 

*  of  each  occurrence. 

$  The  output  from  our  algorithm  is  a  map  MODE  uhich  maps 

$  each  variable  occurrence  into  an  appropriate  'mode 

$  descriptor'.   In  addition,  'locate'  instructions,  which 
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$  are  designated  by  simple  assignment  instructions  with 
$  member  basing  modes  for  ©variables  and  non-member  basing 
$  modes  for  ivarables,  are  also  inserted  into  program 
$  code . 

i  We  allow  unique  representation  structure  for  each 

$  variable  occurrence  01,  regardless  hou  the  routine  in 
$  which  01  appears  is  invoked.   Unlike  most  of  the  other 
$  attribute  maps  defined  in  the  SETL  optimizer,  which  map 
$  variable  occurrences  to  pairs  I RC-STRING , ATTRIBUTE  1  ,  the 
$  map  MODE  will  map  each  variable  occurrence  into  a  single 

*  mode  descriptor. 

$  Mode  Descriptor 

$      A  mode  descriptor  is  a  complete  description  of  the 
i    representation  structure  of  an  object.   It  has  a  structure 
$  similar  to  that  of  a  type  descriptor,  but  is  represented 
$  as  a  tuple  of  length  four  instead  of  a  tuple  of  length 
$  two.   The  detailed  structure  of  a  mode  descriptor  is 

*  [GROSSTYP,  COMPTYP,  BASENAM,  REPRATT  1 

*  The  first  two  fields  GROSSTYP  and  COMPTYP  have  the  same 
$  meanings  as  they  have  in  a  type  descriptor.   However, 

$  TEMLT  which  indicates  member  basing  and  TBASE  which 

$  indicates  bases  are  introduced  as  new  gross  types .   When 

$  the  GROSSTYP  of  a  mode  descriptor  is  TELMT,  the  BASENAM 

$  field  contains  the  name  of  its  base.   The  last  field 
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$  REPRRTT  is  used  to  describe  the  representation  attribute 
$  of  domain  based  objects.   The  allowed  representation 
$  attributes  are  defined  by 


const 
(C39):     SPARSE 
(C39):    REMOTE 
(C41):     LOCAL 

end  const; 


=  {  NEMAT  } 
=  {  NEWAT  } 
=  {  NEWAT  } 


$  sparse  representation 
$  remote  representation 
$  local  representation 


$  Particular  examples  of  mode  descriptors  are  s 
$  1.  The  mode  '€B'  is  represented  as 

$     [TELMT,  on,  B,  on]. 

$  2.  The  mode  'local  set(eB)'  is  represented  as 

*  ITSET,  ITELMT,  OM,  B,  OM 1 ,  OM,  LOCAL]. 

$  3.  The  mode  'sparse  smap ( €B 1 ) eB2 '  is  represented  as 

$       [TMAP. ITMTUP. I  [ TELMT, 0M.B1  , OM  1  ,  ( TELMT , OM , B2 , OM  1  1,0M,SPARSE  1 

$  4.  The  mode  'base(int)'  is  represented  as 

*  [ TEASE, TI.OM.OM 1 . 

$  Tuo  macros,  in  addition  to  those  defined  on  type 

$  descriptors,  are  used  to  reference  mode  descriptors. 


(M24):    macro  BASENAM(M)  ;   M(3)    endm  ; 

$  base  name  of  mode  descriptor  M 

(M25):  macro  REPRATT(M)  ;   M ( 4 )    endm  ; 

$  representation  attribute  of  mode 
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*  descriptor  M 

*  The  following  additional  macros  are  used  to  manipulate 
$  mode  descriptors  of  variable  occurrences. 

(n26):  macro  MGTYPCVAR)  ;  GROSSTYP CnODE ( VAR ) )  endm  ; 

*  gross  type  of  a  variable 
$  occurrence 

(M27)!  macro  GLTYP(VAR)  ;  LENTYP ( MODE( VAR) )   endra  ; 

$  length  of  a  tuple 

(n28):  macro  ELMBASECSET)  ;  BASENAN ( COMPTYP ( MODE ( SET )) )  endm  ; 

$  base  of  a  based  set 

(n29)!  macro  DOMBASECriAP)  ;  BASENAn(  DOMTYP  (  MODE  (  MAP  ))  )  endm  ; 

*  base  of  the  domain  of  a  based 
$  map 

(M30):  macro  RANBASE(MAP)  ;  BASENAM ( RANTYP ( MODE (MAP )) )  endra  ; 

$  base  of  the  range  of  a  map 

(M31):  macro  COMBASECTUP , N )  ;  BASEMAMt CTYPNC MODE ( TUP ), N ) )  endm 

$  base  of  the  N-th  component 
$  of  a  tuple 

*  Global  Variables 

$     In  addition  to  the  global  variables  used  in  the  SETL 
$  optimizer,  ue  introduce  the  following  global  variables 
$  in  our  algorithm. 
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vars 


MODE  ,  *  map  from  occurrences  to  their  mode 

$  descriptors 

LIVEPDS  ,        $  set  of  live  periods  of  variables 

$  having  composite  object  values 

IS-'FORMAL  ,      %  map  on  bases  to  indicate  that  a 

$  base  only  supports  the  formal 

$  parameters  of  a  procedure 

NBASE  ,       '    %  map  on  real  bases  to  count  the 

$  number  of  bases  in  the  same 

$  equivalence  class 

NBASEDON  ,        $  map  on  bases  to  count  the  number 

$  of  sets  and  maps  based  on  them 

PARENT  ,         $  map  from  bases  to  their  preceding 

$  nodes  in  the  equivalence  class  tree 

LCCIHS  ,  $  set  of  possible  'locate' 

$  instructions  to  be  inserted 

BASE-'ELMTS ,      $  set  of  occurrences  which  are  known 

$  to  be  elements  of  bases 

ID-TO-BASE,      $  map  on  variable  occurences 

$  indicating  whether  the  occurrence 

$  values  are  indentical  with  their 

S  bases 
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HASH-USE, 


$  map  on  variable  occurrences 
$  indicating  whether  the  occurrence 
$  values  are  subsequently  subject  to 
$  operations  involving  hashing 


NCN-HASH-USE 


$  map  on  variable  occurrences 
$  indicating  whether  the  occurrence 
$  values  are  subsequently  subject  to 
$  operations  not  involving  hashing 


end  vars  ; 


$  Useless  Bases 


*      Our  data  structure  choice  algorithm  first  introduces  a 

$  base  for  each  composite  object  and  then  equivalences 

$  bases.   After  this  equivalencing  procedure,  some  bases 

$  may  eventually  be  found  useless.   A  base  is  useful  only 

$  if  at  least  tuo  composite  objects  are  based  on  it, 

$  because  then  the  basing  pointers  held  by  one  can  be  used 

$  to  access  the  other.   If  a  base  is  simply  the  domain  of 

$  a  map  (and  nothing  else)  then  nothing  is  gained  by  its 

$  existence,  because  there  is  no  way  to  generate  elements 

$  of  that  domain  without  recalculating  the  corresponding 

$  basing  pointer.   The  same  is  true  if  the  only  objects 

$  supported  by  a  base  are  a  set  and  its  elements.   In  this 

$  case,  the  map  (or  set)  should  be  unbased.   Consequently, 

$  any  base  which  supports  only  a  single  composite  object 
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$  is  useless,  unless  the  object  is  a  formal  parameter  of  a 
$  procedure.   To  detect  such  a  case,  ue  provide  the 
$  following  macro. 

(M32):    macro  CAN-DROP(B)  ; 

NBASEDON(B)  =  1  and  not  IS-FORMRL ( B ) 
endm  ; 

$  Representations  for  the  Global  Variables  used 

*      Me  nou  declare  the  representations  for  the  global 
$  variables  used  by  the  SETL  optimizer. 

repr 

$  Variables  defined  in  the  SETL  optimizer 

(VI):     SYMBOLS  :  base  ;    $  base  of  symbols 
(V2):     NAME  =  smap ( €S YMBOLS ) char  ; 

$  name  of  a  symbol 
(V3):     VALUE  :  smap ( eSYMBOLS ) real  ; 

$  value  of  a  symbol 
(V4):     IS-CONST  ••  smap  (  €S  YMBOLS  )  bool  ; 

$  indicates  constant 
(V5):     IS-GLOB  =  smap (€ SYMBOLS ) bool  ; 

$  indicates  global  variable 

(V6):     OI-BASE  =  base  ;    $  base  of  ©variable  occurrences 
(V7):     RC-BASE  :  base  ;    $  base  of  RC-strings 
(V8):     ALL-OI  :  se t ( e 01- B ASE )  ; 
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$  set  of  all  variable 

$  occurrences 
(V9)-.     ALL-0  :  se  t  (  eOI-BASE )  ; 

$  set  of  all  ovariable 

$  occurrences 
(VIO):    ALL-I  :  se t ( € OI-B ASE )  ; 

$  set  of  all  ivariable 

$  occurrences 
(VII):    BFROM  :  mmap {€ oi-BASE } set ([ SRC- BASE ,€ 01- BASE  1  )  ; 

$  data  flow  map 
(V12):    FFROn  :  mmap { eOI-B ASE } se t ( I  € RC-B ASE , eoi-BASE  ]  )  ; 

$  data  flow  map 
(V13):    PS-CRTHIS  :  mmap { € 01- B ASE } se t ( [ eRC-B ASE , €OI-B ASE  1  )  ; 

$  value  flou  map 

(V14):    TYPE-BASE  :  base  ; 

* 

$  base  of  type  descriptors 
(V15):    TYPES  :  mmap { OI-BASE } se t ([ e CR-BASE ,€ TYPE-BASE  1)  ; 

$  possible  types  of  variable 
$  occurrences 

(V16):    INSTRS  =  base(int)  ; 

*  base  of  instructions 
(V17):    BLOCK-BASE  :  base  ; 

$  base  of  code  blocks  and 

$  intervals 
(V18):    OPCODES  :  base  ;    $  base  of  opcodes 
(V19):     NEXT  :  smap (€ INSTRS )£ INSTR  ; 
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$  next  instruction 
(V20):    BLOCKOF  :  smap ( eiNSTRS ) eBLOCK-BASE  ; 

$  code  block  containing  the 

*  specified  instruction 
(V21):     OPCODE  :  smap ( GINSTRS ) €OPCODES  ; 

$  operation  code  of  an 
$  instruction 
(V22):    ARCS  :  smap ( eiNSTRS ) tuple ( eoI-BASE , eoI-BASE  ,  eOI-BASE ) 

$  arguments  of  an  instruction 

$  variables  particular  to  the  data  structure  choice 
$  algortihra 

(V23)--    SB-BASE  :  base  ;    $  base  of  all  generated  bases 
(V24):    IS-FORMAL  :  smap ( eSB-BASE ) bool  ; 

*  indicates  whether  a  base  is  a 
$  formal  base 

(V25);    NBASES  :  smap ( eSB-BASE ) int  ; 

$  number  of  bases  in  the  same 

*  equivalence  class  as  a  geven 
$  base 

(V26):    NBASEDON  :  smap ( CSB-B ASE ) int  ; 

$  number  of  sets  and  maps  based 

$  on  a  base 
(V27):    BASE-ELMT  =  mmap { eSB-B ASE ) SET ( eOI-B ASE )  ; 

$  occurrences  inserted  into  a 

$  base 

(V28):    MODE-BASE  :  base  ;  $  base  of  mode  descriptors 
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(V29):     BMODE  :  map ( eSB-BASE ) GMODE-B ASE  ; 

$  mode  of  a  base 
(V30):    MODE  :  map ( €OI-BASE ) €MODE- BASE  ; 

$  basing  modes  of  an  occurrence 

(V31):    LPD-BASE  :  base  ;   $  base  of  live  periods 
(V32):    LIVEPDS  :  set ( €LPD-BASE )  ; 

$  set  of  live  periods 

(V33):    HASH-USE  =  smap ( eOI-BASE ) bool  ; 

$  map  on  variable  occurrences 

$  indicating  whether  the 

$  occurrence  values  are 

$  subsequently  subject  to 

$  operations  involving  hashing 

(V3t|):    NON-HASH-USE  =  smap  (  €OI-BASE )  bool  ; 

$  map  on  variable  occurrences 
$  indicating  whether  the 
$  occurrence  values  are 
$  subsequently  subject  to 
$  operations  not  involving 
$  hashing 

(V35):     ID-TO-BASE  :  smap ( €OI-BASE ) bool  ; 

$  map  on  variable  occurrences 

$  indicating  whether  the 

*  occurrence  values  are 

$  indentical  with  their  bases 

end  repr  • 
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$  Program  Ortranization 

$      To  make  it  easier  to  understand  the  global  structure 

$  of  our  algorithm,  ue  outline  its  major  subroutines  in 

$  their  calling  hierarchy. 

$  (PI):   GENBASE        -  generate  bases 

$  (P2):      CONSTR-PS-CRTHIS   -  construct  PS-CRTHIS  map 

$  (P3):   GENLOCS        -  generate  locate  instructions 

$  (PM):     MERGEOBJ     -  process  set  algebraic  operations 

$  (P5):     PROPELMT     -  process  set  insertion  operations 

*  (P6)5     PROPOFHAP    -  process  map  retrieval  operations 
$  (P7):     PROPSOFMAP   -  process  map  storage  operations 

$  (P8):     PROPOFTUP    -  process  tuple  retrieval  operations 

$  (P9)s     PROPSOFTUP   -  process  tuple  storage  operations 

$  (PIO):    PROPSOFAMAP  -  process  map  range  storage 
$  -  operations 

$  (P11):      MERGE      -  merge  bases 

*  (P12):         MERGE-INTO   -  merge  the  mode  of  an  occurrence 

*  I  -  Mith  the  element  mode  of  a  base 
$  (P13):    INSERTLOCS   -  insert  locate  instructions 

$  (Pm):       E2UIV      -  equivalence  bases 

$  (P15):        REALB    -  find  real  bases 

$  (P16):        tlODEDIS.   -  calculate  mode  disjunction 

*  (P17):       PARTITION    -  partition  pseudo  creation  points 
$  (P18):         LASTCALL   -  find  last  calling  point  of 

$  -  a  procedure 
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$  (P19):  MOVELOCS       -  move  locate  instructions 

$  (P20):  UPDMODES       -  update  occurrence  modes 

$  (P21)!  nODECnPRS    -  compress  mode  descriptors 

*  (P22):  SUBSTMD    -  substitute  mode  descriptors 

$  (P23):  USE-DETERM   -  determine  uses  of  variable  values 

$  (P2'4):  BASING-PROP  -  propagate  basing  mode 

$  (P25):  REFINE         -  refine  occurrence  modes 

$  (P26):  ID-BASE      -  verify  occurrences  identical 

*  -  with  bases 

*  (P27)!  SETOF      -  find  sets  constructed  by  set 
$  -  formers 

$  (P28):  MAKE-REMOTE  -  choose  remote  representations 

$  Cross-reference  Listing  of  Global  Names 

$  For  reference  purpose,  ue  list  in  the  appendix  B  all 

$  global  names  used  in  our  algorithm  in  their  alphabetical 

$  order . 

$  SETL  code 

$  Now  we  are  ready  to  present  the  code. 

proc  AUTO-DATA  public  ; 

$  This  is  the  main  routine  of  our  algorithm. 

$  Initialize  global  variables. 

MODE  :=  HBASES  :=  LIVEPDS  :=  LOGINS  :=  NBASEDON  :=  PARENT  :=  nl  ; 
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BASE-ELMTS  :  =  IS-TO-^BASE  ==  HASH-USE  :=  NON-HASH-USE  :  =  nl  ; 


GENBASEC ) ; 
GENLOCSC ) ; 
nOVELOCS(); 
UPDMODES( ) ; 
REFINEC ) ; 

end  AUTO-DATA  ; 


$  invoke  phase  I 

$  invoke  phase  II 

$  invoke  phase  III 

$  invoke  phase  IV 

$  invoke  phase  V 


proc  GENBASE  ; 

$  The  purpose  of  this  procedure  is  to  improve  the  efficiency 
$  of  the  subsequent  phases. 

$  This  procedure  generates  a  base  for  each  live  period  of  a 

$  composite  object.   A  live  period  is  used  here  to  mean  a 

$  set  of  occurrences  of  a  given  variable,  uhich  are  linked 

$  by  the  chaining  maps  FFROM  and  BFROM  and  can  therefore 

$  be  expected  to  have  the  same  basing.   However,  bases  are 

$  generated  only  for  the  live  periods  in  uhich  all  the- 

*  occurrences  have  the  same  gross  type.   No  bases  are 

$  generated  for  the  bases  which  consists  of  occurrences  of 

$  indefinite  gross  type  (e.g.,  TSETTUP  and  TMAPTUP). 

$  Objects  of  indefinite  gross  type  can  never  be  doamin 

$  based.   Each  base  generated  in  this  phase  initiate  a 

$  seperate  equivalence  class.   Equivalence  classes  will  be 

$  merged  in  phase  II. 
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$  To  facilitate  the  adjustment  of  modes  during  phase  V,  it 
$  is  convenient  to  assume  that  tuples  are  also  based, 
$  i.e.,  that  their  components  are  also  elements  of  some 

*  bases.   Introduction  of  such  bases  is  harmless,  because 
$  if  no  composite  objects  end  up  being  based  on  them,  they 
$  Mill  be  dropped. 

$  This  routine  is  called  by  the  main  routine  AUTO-DATA  and 
$  calls  the  routine  CONSTR-PS-CRTHIS  to  construct 
$  PS-CRTHIS  map  for  subsequent  use.   This  routine  also 
$  calls  a  utility  routine  DIS  in  the  type  finder  to  find 
$  the  disjunction  of  a  set  of  type  descriptors. 

*  The  global  variables  referenced  by  this  routine  include 
$      LIVEPDS      -  set  of  live  periods 

$  NBASES(B)    -  number  of  bases  in  the  same  equivalence 

$  -  class  as  B 

$  NBASEDON(B)  -  number  of  sets  and  maps  based  on  base  B 

$  IS-FORMAL(B)  -  indicates  whether  B  is  a  formal  base 

$  All-'OI       -  all  variable  occurrences 

$  nODECOI)     -  mode  of  occurrence  01 

$  BFROM{OI}    -  occurrences  to  uhich  01  is  directly 

*  -  linked 

$      FFROM{OI)    -  occurrences  uhich  are  directly 

$  -  linked  to  01 

$      TYPES{OI}    -  possible  types  of  occurrence  01 


$  The  macros  used  in  this  routine  include 

$      OI-NAMECOI)  -  name  of  occurrence  01,  see  (M8) 

141 


$ 

$ 

* 


GROSSTYP(T) 

LENTYP(T) 

COnPTYP(II) 

DOMTYP(M) 

RAHTYP(M) 

CTYPNCM.I) 


-  gross  type  of  type  descriptor  T,  see  (MIS). 

-  length  of  type  descriptor  T,  see  (M21). 

-  elemeht  mode  of  mode  descriptor  M.  see  (M19 

-  domain  mode  of  mode  descriptor  M,  see  (M22) 

-  range  mode  of  mode  descriptor  M,  see  (n23). 

-  I-th  component  mode  of  mode  descriptor  M, 

-  see  (.mo  )  . 


$  The  local  variables  defined  in  this  routine  are 


repr 


TODO  :  set( £OI-BRSE )  ;  $  uorkpile  of  variable  occurrengeg 
WORK  :  set ( eoi-BASE )  ;  $  uorkpile  of  variable  occurrences 
TPOfTYP.T  ?  ETYPE-BASE  ;    $  type  descriptors 


0I,WOI  :  egi-BASE  ; 
BASE  •     eSB-BASE  ; 
NEWM  :  enoDE-BASE  ; 
LPD  :  CLPD-BASE  ; 
L  '•    int  ; 
end  repr  ; 


$  variable  occurr@nQ@^ 

$  base 

$  mode  descriptor 

$  live  period 

$  length  of  tuple 


$  Initialize  TODO  to  be  the  set  of  all  variable  occurrences 
TODO  :=  ALL-OI  ; 

$  An  initial  mode  descriptor  is  assigned  to  each  variable 
$  occurrence . 

(  while  TODO  /=  nl  ) 


01  :=  arb  TODO  ;  $  Get  an  occurrence 
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T  :=  DIS.  /  {  TYP  :   [ - , TY P  J  € TYPES { 01 }  }   ; 

$  Disjunction  of  all  possible 
$  types  of  01. 

$  If  01  is  of  primitive  type,  take  its  type  as  the 
$  initial  mode  descriptor. 

if  IS-PRIM(T)  then 

riODE(OI)  :=  T  ; 

TODD  less  01  ; 

continue  while  TODO  ;     $  Process  next  occurrence, 
end  if  ; 

$  Otherwise,  01  is  a  composite  object.   Construct  the 
$  live  period  containing  01. 


WORK  :=  {01}  ; 


$  Initialize  a  workpile 


(while  WORK/=nl) 


01  from  WORK  ; 


T  :=  DIS.  /  {  TYP 


MODE(OI)  :=  T  ; 
TODO  less  01  ; 
LPD  with  01  ; 


$  Choose  an  arbitrary  element  from 

$  WORK. 

[-,TYP]GTYPES{OI}  }  ; 

$  Disjunction  of  all  possible 

*  types  of  01. 

$  Use  type  as  initial  mode. 

$  01  need  not  be  processed  any  more 

$  01  is  included  into  the  current 

$  live  period. 


$  Insert  the  occurrences  which  are  linked  to  01  and 
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$  have  not  been  processed  into  the  uorkpile  WORK 


WORK  +  {WOI  :  ( -,MOI  l€(FFROri{Ol}+BFROM{OI}  )  I  WOI  in  TO] 


end  while  WORK  ; 


$  LPD  is  a  complete  live  period. 
LIVEPDS  with  LPD  ; 


T  :=  DIS.  /  {  TYP 


[ -,TYP  l€TYPES{OI}  I  01  in  LPD  }  ; 
$  Disjunction  of  all  possible 
$  types  of  the  occurrences  in 
$  LPD. 


TPO  ••  =  GROSSTYP(T)  ; 


$  gross  type  of  T 


$  If  all  the  occurrences  in  the  live  period  LPD  have  the 
$  same  definite  composite  type,  construct  a  domain 
*  basing  mode  for  all  of  them.   The  gross  type  is 
$  taken  as  the  initial  mode  descriptor.   This  mode 
$  descriptor  will  be  completed  subsequently. 


NEWn  :=  iTPOl  ; 


$  template  for  mode  descriptor 


case  TPO  of 


(TSET) 


$  If  every  occurrence  01  in  LPD  is  a  set,  generate  a 
*  base  for  01.   Give  01  the  mode  setCcBASE)  by 
$  inserting  the  member  basing  eBASE  into  the  mode 
$  descriptor  NEWn  for  the  elements  of  01.   NEWM 
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$  will  become  I TSET , I TELMT , OM , BASE  1  ]  . 

COMPTYPCNEWM)  :=  (TELMT,  OM,  B  ASE  ••  =NEWAT  1  ; 

$  Set  NBASEDOM(BASE)  ==  1  to  indicate  that  01  is 
$  domain  based  on  BASE. 

NBASEDON(BASE)  : =  1  ; 

$  Let  BASE  form  an  equivalence  class. 

NBASES(BASE)  ••  =  1  ; 

$  Initialize  the  mode  of  BASE. 

BMODE(BASE)  :=  [ TB ASE , COMPTYP ( T )  1  ; 

$  Initialise  BASE  to  be  a  formal  base. 

IS-FORMALCBASE)  :=  TRUE  ; 


(TMAP) 


$  If  every  occurrence  01  in  LPD  is  a  map,  generate 

$  tuo  bases  for  01  ;  one  for  its  aomain  and  the 

$  other  for  its  range.   Give  01  the  mode 

*  map( €BASE 1 ) £BASE2  by  inserting  the  mode 

$  ( €BASE1 , eBASE2  1  into  the  mode  descriptor  NEWM  for 

$  the  elements  of  01.   NEWM  will  become 

$  [TMAP, [TMTUP,  (  [ TELMT, OM,B ASE 1  ],  [ TELMT , OM , BASE 2  1  1  ]  I 

COMPTYPCNEWM)  ==  [TMTUP, [II  ; 

DOMTYP(NEWM)  :=  [TELMT,  OM,  BASE  1  : =NEWAT  I  ; 
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RANTYP(NEWM)  :=  [TELMT,  OH,  B ASE 2  =  =NEWAT  1  ; 

$  Set  NBASEDOM(BASEI)  :=  1  to  indicate  that  01  is 
$  domain  based  on  BASEl, 

KBASEDON(BASE)  :=  1  ; 

$  Let  BASEl  and  BASE2  each  form  an  equivalence 
$  class. 

NBASES(BASEI)  :=  1  ; 
NBASES(BASE2)  :=  1  ; 

$  Initialise  the  mode  of  BASEl  and  BASE2. 

BMODE(BASEI)  ==  [ TB ASE , DOMTYP ( T ) 1  ; 
BM0DE(BASE2)  :=  [ TB ASE , RANTYP ( T )  1  ; 

$  Initialise  BASEl  and  BASE2  to  be  formal  bases. 

IS-F0RMAL(BASE1 )  :=  TRUE  ; 
IS-F0RnAL(BASE2)  :=  TRUE  ; 


(THTUP) 


$  If  every  occurrence  01  in  LPD  is  a  tuple  of  unkouin 

$  length,  generate  a  base  on  which  all  the 

$  components  of  01  to  be  based.   Give  01  the  mode 

$  tuple(€BASE)  by  inserting  the  member  basing 

*  €BASE  into  the  mode  descriptor  HEWM  for  the 

$  components  of  the  tuple.   NEUM  will  become 

$  I THTUP,  [ TELnT,On,BASE  I  1 . 
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COMPTYP(NEWri)     :=        (TELMT,     OM,     BASE  :  =NEWAT  1     ; 

$  Let  BASE  form  an  equivalence  class. 

NBASES(BASE)  :=  1  ; 

*  Initialise  the  mode  of  base . 

BMODE(BASE)  :=  ( TB ASE , COMPTYP ( T )  1  ; 

$  Initialize  base  to  be  a  formal  base. 

IS-FORMAL(BASE)  :=  TRUE  ; 


(TMTUP)  : 


*  If  every  occurrence  01  in  LPD  is  a  tuple  of  known 
$  length,  generate  a  base  for  each  component  of  01 
$  and  let  the  component  be  based  on  this  base. 

L  :=  LENTYP(T)  ;     *  length  of  tuple  01 
COMPTYP(NEWn)  : =  I  1  ; 

(VI  :=  1...L) 

$  Generate  a  base  for  the  component  and  insert 

$  the  member  basing  eBASE  into  the  mode 

$  descriptor  NEWn  for  the  component.   NEWM 

$  will  eventually  become 

$     [TMTUP,  11  TELriT,OM,BASE  I.  .  .  1  ]  . 

CTYPNCNEWM,  I)  ==  [TELMT,  OM,  BASE  ■■  --    NEWAT  1  ; 

$  Let  BASE  form  an  equivalence  class. 
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NBASES(BASE)  :=  1  ; 

$  Initialise  the  mode  of  BASE. 

BMODECBASE)  :=  [ XBASE , CTYPMYP ( T , I) 1  ; 

$  Initialise  base  to  be  a  formal  base . 

IS-FORMAL(BASE)   :=  TRUE 
end  VI  ; 

else 

$  Otherwise,  at  least  one  of  the  occurrences  in  LPD 
$  is  of  indefinite  gross  type.   In  this  case,  no 
*  bases  are  generated. 

continue  uhile  TODO  ; 

$  Process  next  occurrence. 
end  case  ; 

$  All  occurrences  in  LPD  have  the  same  definite  gross 
*  type.   Assign  the  mode  descriptor  justed  constructed 
$  to  all  occurrences  in  LPD. 

(VOI  e  LPD) 

MODE(OI)  :=  NEWM  ;   $  Assign  01  the  mode  descriptor  NEWtl, 


end  V  ; 


end  while  TODO  ; 


$  Call  the  routine  CONSTR-PS-CRTHIS  to  construct  the  map 
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$  PS-CRTHIS  for  subsequent  use. 


CONSTR-PS-CRTHISC )  ; 


return  ; 


end  proc  GENBASE  ; 


proc  COHSTR-PS-CRTHIS  ; 

$  This  routine  constructs  PS-CRTHIS  map.   We  start  uith  the 

$  ©variables  of  value  creation  and  value  retrieval 

$  instructions  (these  are  the  total  set  of  pseudo  creation 

$  points  in  the  whole  program),  and  assign  them  as  the 

$  pseudo  creation  points  of  themselves.   The  pseudo 

*  creation  map  is  then  propagated  through  FFROM  map  and 

$  simple  assignment  instructions  ;  a  pseudo  creation  point 
$  of  an  occurrence  01  must  be  a  pseudo  creation  point  of 

*  every  occurrence  in  FFROM{OI}  and  a  pseudo  creation 
$  point  of  the  ivariable  of  a  simple  assignment 

$  instruction  must  be  a  pseudo  creation  point  of  the 
$  ©variable  of  the  instruction. 

$  The  Morkpile  WORK  consists  of  elements  having  the  format 


$ 


[01,  [P,POI  ]  ] 


$  where  POI  is  a  pseudo  creation  point  of  01  and  P  is  the 
$  path  from  POI  to  01. 

$  This  routine  is  called  by  the  routine  GENBASE. 
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$  The  global  variables  referenced  by  this  routine  include 

$      BLOCKS       -  set  of  code  blocks 

$      OPCODE(I)    -  operation  code  of  instruction  I 

*      PS-CRTHIS{OI}   -  pseudo  creation  points  of  01 

$      FFROMCOI}    -  occurrences  which  are  directly  linked 

$  -  to  01 

$  The  macros  used  in  this  routine  include 

$  FORALLCODEC B, I)   -  for  each  instruction  I  in  block  B, 

$  -  see  (M^) . 

$  IS-IVAR(OI)  -  indicates  whether  01  is  an  ivariable, 

$  -  see  (M12 ) . 

$  OFROni(OI)   -  the  ovariable  in  the  same  instruction 

$  -  as  the  ivariable  01,  see  (MIS). 

$  The  local  variables  defined  in  this  routine  are 

repr 

OVAR, 01 , POI , WOI  :  GOI-BASE  ;  $  variable  occurrences 

WORK  :  set ( eoi-BASE )  ;        $  workpile  of  occurrences 

I  :  eiNSTR  ;  *  instruction 

B  :  eBLOCK-BASE  ;  $  code  block 

P,NP,WP  :  cRC-BASE  ;  $  RC-strings 

end  repr  ; 

$  Assign  the  ©variables  of  value  creation  and  value 

%    retrieval  instructions  as  the  pseudo  creation  points  of 

$  themselves  and  insert  them  into  the  workpile  WORK. 
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(VBeBLOCKS,  FORALLCODE C B , I )  I 

OPCODE(I)  in  (OPS-CREATE  +  OPS-RETRIEVE))) 

OVAR  :=  ARGHI)  ;     $  ovariable  of  the  instruction 
PS-'CRTHIS{OVAR}  5=  {  [  NULL-PATH ,  OVAR  1  }  ; 
WORK  with  lOVAR,  I  NULL-PATH , OVAR  ]  ]  ; 

end  VB  ; 

$  Process  elements  in  WORK  until  WORK  is  emtpy. 

(while  WORK  /=  nl) 

$  Retrieve  an  element  from  WORK. 

lOI,  [P,POIl]  from  WORK  ; 

$  POI  is  a  pseudo  creation  point  of  01  and  P  is  the  path 
$  from  POI  to  01. 

$  For  each  occurrence  WOI  in  FFROM{OI}  which  can  be 
$  reached  from  POI,  POI  is  a  pseudo  creation  point  of 
$  WOI. 

(V[WP,WOI  ]€FFROM{OI}  I 

(  NP  : =  P  CC.  WP  )  /=  ERROR-PATH  ) 

$  NP  is  the  path  from  POI  to 
$  WOI. 

$  Insert  [NP,POIl  into  PS-CRTHIS  {WOI}  if  it  has  not 
$  been  inserted  in  PS-CRTHIS {WOI}  yet. 

if  lNP,POI]  notin  PS-CRTHIS {WOI}  then 
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PS-CRTHISCWOI}  with  lNP,POI]  ; 
WORK  uith  [UOI,  [NP,POII!  ; 
end  if  ; 

end  V  ; 

$  If  01  is  the  ivariable  of  a  simple  assignment 

*  instruction  the  pseudo  creation  points  of  01  are 

$  also  the  pseudo  creation  points  of  the  ovariable  of 

$  the  instruction. 

if  IS-IVRR(OI)  and  OI-OP(OI)  in  OPS-ASN  then 

WOI  :=  OFROni(OI)  ;   $  ovariable  of  the  instruction 
PS-CRTHIS{UOI}  uith  [P,POIl  ; 
WORK  with  I WOI,  (P,POI]]  ; 

end  if  ; 

end  while  WORK  ; 

return  ; 

end  proc  CONSTR-PS-CRTHIS  ; 


proc  GENLOCS  ; 

$  This  procedure  enforces  the  basings  chosen  for  composite 

$  objects,  by  generating  'base  insertion'  ('locate') 

$  instructions  for  all  variable  occurrences  whose  values 

$  might  be  incorporated  into  a  composite  object.   For 

*  example,  the  instruction  = 
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* 


SI  : =  S  with  X  ; 


$  leads  to  the  basing  relation 


* 


X  :  GB  ; 


$  where  B  is  the  base  previously  assigned  to  the  variable 

S  occurrence  of  S.   This  basing  relation  for  X  is  enforced 

$  by  emitting  'locate*  instructions  for  the  ovariable 

$  occurrences  belonging  to  the  set  PS-CRTHIS{X}  (except  in 

$  certain  cases  discussed  belou).   Here,  PS-CRTHIS{X}  is 

$  the  set  of  pseudo  creation  points  of  X,  i.e.,  the 

$  occurrences  which  are  the  ovariables  of  value  creation  or 

$  value  retrieval  instructions  and  whose  values  can  be 

$  trasmitted  to  X  through  simple  assignment  instructions. 

$  A  similar  approach  is  taken  to  map  retrieval  and  store 
$  operations.  If  in  phase  I  the  map  F  has  been  assigned 
$  the  mode  ' map ( eB 1 ) €B2 ' ,  then  the  instruction 


* 


F(X) 


y  ; 


$  will  imply  the  basing  relation 


$ 


X  :  €B1  ; 


Y  :  eB2  ; 


$  In  this  case,  locate  instructions  (into  HI  and  B2)  are 
$  emitted  for  the  occurrences  in  PS-CRTHIS{X}  and 
$  PS-CRTHIS{Y} ,  respectively. 


$  Note  that  these  'locate'  instructions  are  not  directly 
$  inserted  into  the  code,  but  are  kept  in  a  temporary  set, 
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$  for  the  following  reasons  : 

$  A)  The  bases  being  used  at  this  stage  are  not  the  actual 
$  bases  tahich  will  appear  at  run-time.    Actual  bases  will 
$  be  determined  subsequently  by  building  up  equivalence 
$  classes  of  the  base  names  introduced  in  phase  I. 

$  B)  Some  bases  may  eventually  prove  useless,  because  they 
$  support  only  one  composite  object,  in  which  case  all 
$  'locate'  instructions  which  reference  them  must  be 
$  dropped. 


$  As  we  proceed  in  enforcing  basing  relations,  equivalence 
$  relations  emerge  among  bases.   When  about  to  generate  a 
$  locate  instruction  to  insert  a  pseudo  creation  point  Y 
$  into  the  base  B1  of  X,  we  check  to  see  if  Y  is  the 
$  ©variable  of  a  value  retrieval  instruction  and  if  the 
$  composite  object  S  from  which  Y  is  retrieved  has  been 
$  domain  based  on  a  base  B2  (i.e.,  if  S  is  of  a  definite 
$  gross  type  and  a  base  has  been  introduced  for  it  during 
$  phase  I).   In  this  case,  Y  will  be  member  based  on  B2,  and 
$  we  just  equivalence  t>ie  bases  B1  and  B2  without 
$  generating  any  locate  instruction.   Moreover,  if  the 
$  above  condition  is  not  satisfied  but  if  Y  has  already 
$  been  assigned  a  locate  instruction  which  will  insert  Y 
$  into  a  base  B3,  we  still  do  not  generate  a  new  locate 
$  instruction,  but  just  equivalence  the  bases  B1  and  B3. 
$  Certain  other  instructions  force  similar  base 
$  equivalencing  rather  than  generating  locate  instructions 
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*  5  e.g.,  set  union  and  intersection  force  their  arguments 
$  have  the  same  base. 

$  If  ue  have  equivalenced  two  bases  B1  and  B2,  then  Bl  and 
$  B2  are  considered  as  two  names  of  the  same  actual  base 
$  B,  (which  Mill  emerge  subsequently  as  the  representative 
$  of  the  equivalence  class  to  which  Bl  and  B2  belong)  . 

$  The  process  of  base  equivalencing  and  locate  generation 

$  just  described  is  complicated  by  the  existence  of 

$  procedure  calls  and  the  need  to  take  variable  and  base 

$  scopes  into  account.   For  a  given  variable  occurrence 

$  VO,  for  which  a  base  BO  has  been  suggested,  the 

$  following  may  be  the  case  : 

$  A)  VO  is  an  occurrence  of  a  global  variable  V.   Then  it  is 

$  reasonable  to  assign  the  same  basing  to  all  its 

$  occurrences  (or  more  precisely,  to  associate  one  global 

$  base  with  each  of  its  live  periods.   See  above)  .   The 

$  base  associated  witn  such  variables  is  therefore  called 

$  a  global  base. 

$  B)  VO  is  an  occurrence  of  a  formal  parameter  of  the 

$  procedure  P.   Then  if  a  base  exists  for  VO,  this  base  is 

$  a  formal  one  ;  each  call  to  P  will  instantiate  it,  by 

$  passing  to  P  some  actual  base  AB,  (which  will  be  the 

$  base  of  the  actual  calling  parameter  AV,  to  which  VO 

$  corresponds).   It  is  then  reasonable  to  require  that  all 

$  actual  parameters  at  various  points  of  call  have  the  same 
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$  form  as  that  chosen  for  VO,  but  in  each  case  ue  allow 
$  the  actual  bases  to  be  distinct.   It  uould  be  unwise  to 
$  equivalence  these  bases  (since  equivalencing  more  bases 
$  than  strictly  necessary  may  lead  to  the  creation  of  very 
$  sparse  objects),  but  it  is  reasonable  to  equivalence  all 
$  the  bases  which  may  appear  at  a  given  point  of  call. 
$  This  is  achieved  by  partitioning  PS-CRTHIS { VO}  according 
$  to  the  points-of-call  by  which  a  given  occurrence  VOX 
$  becomes  the  value  of  VO.   Then  the  bases  occurring  in 
$  each  such  partition  can  be  equivalenced . 

$  Note  that  if  VO  is  not  a  formal  parameter,  but  is 

$  nevertheless  linked  to  the  formal  parameters  of  P 

$  through  value-flow,  then  the  preceding  remarks  still 

$  apply  :  VO  may  be  based  on  a  formal  base,  i.e.  some  base 

$  of  the  formal  parameters  of  P.   In  such  cases,  the  same 

$  partitioning  of  PS-CRTHIS  according  to  points-of-call  is 

$  used. 

$  C)  Finally,  VO  may  be  local  to  P,  i.e.,  it  may  be  a  local 

$  variable  whose  value  is  created  only  within  P,  and  which 

$  does  not  enter  into  any  operation  whose  other  arguments 

$  are  global  or  linked  to  points  of  call  of  P.   In  that 

$  case,  VO  (and  the  other  arguments  of  operations  in  which 

$  VO  appears),  receives  an  actual  local  base. 


$  In  order  to  simplify  the  mode  adjustment  phase,  the 

$  arbitrary  basings  chosen  for  tuple  components  and  for 

$  the  range  of  maps  in  the  preceding  phase,  are  propagated 
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$  during  the  present  phase,  in  the  same  way  as  the  basings 
*  of  set  elements.   Operations  of  incorporation,  i.e., 
$  tuple  assignments,  are  treated  as  map  stores  and  the 
$  same  base  equivalencing  procedure  is  used  in  all  cases. 

$  Base  equivalencing  is  carried  out  by  using  a  compressed 
$  balanced  tree  technique.   Equivalence  classes  of  bases 
$  are  represented  by  a  forest  of  trees.   The  toot  of  each 
$  such  tree  is  the  representative  (and  is  called  the  real 
$  base)  of  the  bases  in  the  tree.   Trees  are  structured  by 
$  map  PARENT  ;  PARENT(B)  points  to  the  parent  node  of  B  in 
$  the  tree  containing  B  if  B  is  not  a  root,  otherwise 
$  PARENT(B)  is  undefined. 

$  This  routine  is  called  by  the  main  routine  AUTO-DATA  and 

$  calls  the  folouing  routines  NERGEOBJ,  PROPELnT, 

$  PROPOFMAP,  PROPOFTUP,  PROPSOFMAP,  PROPSOFTUP  and 

$  PROPSOFAMAP.   All  of  these  routines  perform  similar 

$  functions,  namely  generate  locate  instructions  and  merge 

$  bases,  in  a  manner  depending  on  the  operation  of  the 

$  instruction  being  processed. 

$  The  global  variables  referenced  by  this  routine  include 

$  BLOCKS       -  set  of  code  blocks 

$  ARGS(I)      -  arguments  of  instruction  I 

$  OPCODE(I)    -  operation  code  of  instruction  I 

$  PS-CRTHISlQI}   -  pseudo  creation  points  of  01 

$  The  macros  used  in  this  routine  include 
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$  FORRLLCODECB, I)   -  for  each  instruction  I  in  block  B, 

$  -  see  (n^) . 

$  MGTYP(OI)    -  gross  type  of  occurrence  01,  see  (M26). 

*  IS-PRIM(M)   -  indicates  whether  M  is  a  primitive  mode, 
$  -  see  (M17) . 

$  The  local  variables  defined  in  this  routine  are 

repr 

IV1,IV2,0V  :  eoi-BASE  ;    $  variable  occurrences 

B  :  €BLOCK-BASE  ;  $  code  block 

end  repr  ; 

*  Iterate  through  each  instruction  of  the  program. 
(VBeBLOCKS,  FORALLCODE( B , I) ) 

[OV,  IVI,  IV2  1  =  ARGS(I)  ;   $  Unpack  instruction 
case  OPCODE(I)  of 

$  For  a  comparison  operation  =  e<juivalence  the  bases 
$  of  the  ivariables  if  they  are  composite  objects.     ^ 

(el-EQ,  el-NE,  S1-INCS)  ! 

if  not  IS-PRinCtlGTYPCIVl  )  )  then 

MERGEOBJCIVl , PS-CRTHIS { I V 1 }  , I V 2 , PS -CRTHIS { I V2 1  )  } 
end  if  ; 

$  For  a  simple  assignment  '■    equivalence  the  bases  of  the 
$  ivariables  and  the  ©variable  if  they  are  composite 
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$    objects. 

(21-ASH,     ei-ARGIN,     21-PUSH,     SI-POP)      : 

if    not    IS-PRII1(NGTYP(IV1  )  )     then 

MERGEOBJ(OV,PS-CRTHIS{OV} .IV1,PS-CRTHIS{IV1}  )  ; 
end  if  ; 

$  For  an  algebraic  operation  :  equivalence  the  bases  of 
$  the  iuariables  and  the  ovariable  if  they  are 
$  composite  objects. 

(Bl-ADD.  ei-suB,  ei-nuLT>  gi-noD)  : 

if  not  IS-PRIM(MGTYP(IV1 ) )  then 

MERGEOBJCIVI  ,PS-CRTHIS{IV1}  ,IV2,PS-'CRTHIS{IV2}  )  ; 

nERGEOBJ(OV,PS-CRTHIS{OV} ,IV1 ,PS-CRTHIS{IV1}  )  ; 
end  if  ; 

%    For  a  set  or  tuple  insertion  or  deletion  operation  : 
$  equivalence  the  bases  of  the  ovariable  and  the  first 
$  argument,  and  generate  locate  instructions  for  the 
$  second  argument. 

(el-WITH,  ei-LESS)   : 

if  MGTYPCIVI )=TSET  or  MGTYP ( IV 1 ) =THTUP  then 

MERGEOBJ(OV,PS-CRTHIS{OV} ,IV1 ,PS-CRTHIS{IV1}  )  ; 

PROPELMTCIVl  ,PS-CRTHIS{IV1  }  ,IV2,PS-'CRTHIS{IV2}  )  ; 
end  if  ; 

$  For  a  membership  operation  :  generate  locate 
S  instructions  for  the  first  argument. 
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Cei-IN,  ei-NOTIN)  : 

if  MGTYP(IV2)=TSET  or  MGTYP ( IV2 ) =THTUP  then 

PR0PELnT(IV2  ,  t'S-CRTHIS{IV2}  ,IV1  ,  PS-CRTHIS  ( IV  1  }  )  ; 
end  if  ; 

$  For  a  set  or  tuple  former  =  generate  locate 
$  instructions  for  each  component. 

(ei-SET,  ei-SETl,  el-TUP,  QI-TUPD  : 

if  nGTYP(OV)=TSET  or  NGTYP ( OV ) =THTUP  then 

PROPELMTCOV,  PS-CRTHIS { OV }  ,  I V  1  , CRTHIS { IV  1  }  )   ; 
end  if  ; 

$  For  a  map  or  tuple  retrieval  operation  '■     generate 
$  locate  instructions  for  the  index  variable. 

(21-OF,  21-OFA,  ei-OFB)  : 

if  MGTYPCIVI )=TMAP  then  *  IV 1  is  a  map. 

PROPOFMAPdVI  ,PS-'CRTHIS{IV1},IV2.PS^CRTHIS{IV2})  ; 
elseif  TTUP  incl  riGTYPdVl)  then   $  IV1  is  a  tuple. 

PROPOFTUPCIVI  ,  PS-CRTHIS{IV1  }  ,  IV2  )  ; 
end  if  ; 

$  For  a  single-valued  storage  operation  of  map  or  tuple 
$  ••  generate  locate  instructions  for  the  ivariables  . 

(el-SOF,  21-LESSF)  : 

if  MGTYPCIVI )=TnAP  then  $  IVl  is  a  map. 

PROPSOFnAP(OV,PS--CRTHIS{OV}  ,IVl,PS-'CRTHIS{IVll,IV2 
PS-CRTHIS{IV2}  )  ; 
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elseif  TTUP  incl  MGTYPdvn  then   $  IV1  is  a  tuple. 

PROPSOFTUP(OV,PS-CRTHIS{OV} ,IV1,IV2,PS-CRTHIS{IV2}) 
end  if  ; 

$  For  a  multi-valued  storage  operation  :  F{X}  :=  S  , 
$  treat  the  right-hand  side  differently,  and  invoke  a. 
$  seperate  routine. 

Cei-SOFA)  :  PROPSOFAMAP(OV,PS-CRTHIS{OV} ,IV1 .PS-CRTHISdVl} 

IV2,  PS-CRTHISdva}  )  ; 

else  $  Other  opcodes  are  not  examined. 

continue  V  ; 
end  case  ; 

nd  VB; 

return  ; 

end  proc  GENLOCS  ; 


$  Nou  follows  a  family  of  routines  all  of  uhich  perform 

$  similar  functions,  namely  equivalencing  bases  and 

$  inserting  locate  instructions,  but  for  different  kinds 

$  of  operations.   This  family  consists  of  the  routines 

$  MERGEOBJ,  PROPELHT,  PROPOFHAP,  PROPOFTUP,  PROPSOFMAP, 

$  PROPSOFTUP  and  PROPSOFAMAP .   Because  of  their 

$  similarities  detailed  documentation  is  provided  only  in 

$  the  routine  MERGEOBJ.   Please  make  reference  to  this 

$  routine  wherever  a  lack  of  documentations  is  sensed  in 

$  the  other  routines  of  this  group. 
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proc  MERGE0BJ(V1,  CR1,  V2,  CR2)  ; 

$  This  procedure  equivalences  the  bases  of  composite  objects 
$  which  are  arguments  of  the  same  instruction.   However, 
$  no  equivalencing  is  performed  if  VI  and  V2  are  of 
$  different  gross  types.   CR1  and  CR2  must  be  the  pseudo 
$  creation  points  of  VI  and  V2,  respectively. 

$  In  order  to  take  inter procedural  calls  into  account,  this 

$  routine  calls  the  routine  PARTITION  to  partition  CR1  and 

$  CR2  according  to  the  points  from  which  the  routine 

$  containing  VI  and  V2  is  called.   The  values  PCR1  and 

$  PCR2  returned  by  the  routine  PARTITION  are  the  mappings 

*  which  map  the  points,  from  which  the  routine  containing 

$  VI  and  V2  is  called,  to  the  pseudo  creation  points  in  CRI 
$  and  CR2 .   Elements  in  the  image  sets  of  PCR1  and  PCR2 
$  are  then  equivalenced . 

$  This  routine  is  called  by  the  routine  GENLOC  and  calls  the 
$  routine  MERGE  and  the  routine  PARTITION. 

$  The  global  variables  referenced  by  this  routine  include 

$      IS-GLOB(V)   -  indicates  whether  V  is  a  global  variable 

*  IS-FORMAL(B)  -  indicates  whether  B  is  a  formal  base 

$  The  macros  used  in  this  routine  include 

*  MGTYP(OI)    -  gross  type  of  occurrence  01,  see  (M26). 
$      OI-NAMECOI)  -  name  of  occurrence  01,  see  (MS). 

$      ELMBASE(OI)  -  base  of  the  elements  of  occurrence  01, 
$  -  see  (ri28)  . 
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$  The  local  variables  defined  in  this  routine  are 

repr 

VI, V2. OBJ, 01  :  eoi-BASE  ;   $  variable  occurrences 

CR1,CR2  :  set( ( GRC-BASE.eOI-BASE  I)  ; 

$  pseudo  creation  points 

VS1.VS2  :  setCeoi-BASE)  ;   $  set  of  variable  occurrences 

CALL  :  €RC-BASE  ;  $  RC-string 

PCR1,PCR2  s  mmap(€RC-'BASE)set(€OI^BASE)  ; 

$  maps  from  RC-strings  to  sets 
$  of  01  ;  generated  by  the 
$  routine  PARTITION 

CL  :  set(€0I-BASE)  ;      $  set  of  variable  occurrences 
end  repr  ; 

if  MGTYP(VI)  /=  riGTYP(V2)  then 

$  Return  if  VI  and  V2  are  of  different  gross  types. 

return  ; 
end  if  * 

if  IS-GL0B(0I-NAnE(V1 ) )  or  IS-GLOB ( OI-NAMEC V2 ) )  then 

$  If  either  VI  or  V2  is  a  global  variable  then  turn  off 
$  IS-FORMAL  flags  for  the  bases  of  VI  and  V2. 

IS-'FORnAL(ELriBASE(  VI  )  )  ==  FALSE  ; 
IS-FORnflL(ELnBASE( V2) )   ==  FALSE  ; 


$  Merge  the  bases  of  VI,  V2  and  the  occurrences  in  CR1 
$  and  CR2. 
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VS1  : =  (01,  I -,oi  ]€CR1 }  ; 

VS2  :=  {01,   I-,0Il€CR2}  ; 
HERGEC (VS1+VS2)  uith  V2,  VI)  ; 


else 


$  OtheruisG,  VI  and  V2  are  argument  variables  or 
$  variables  local  to  a  procedure. 

$  Partition  CR1  and  CR2  into  equivalence  classes, 

*  according  to  the  points-of -call  through  uhich  they 
$  transmit  their  values  to  VI  and  V2.   In  the  case  of 
$  very  local  variables,  only  one  partition  is 

*  produced,  because  all  RC-strings  in  PS-CRTHIS  are 
$  empty  (values  are  generated  uithin  the  procedure 
$  itself)  , 


PCR1  ••=  PARTITI0N(CR1  )  ; 
PCR2  :=  PARTITI0N(CR2)  ; 

*  Check  to  see  if  both  variables  are  very  local.   If  so, 
S  their  bases  are  not  formal.   Note  that  VI  (or  V2)  is 
$  very  local  if  and  only  if  PCRl  (or  PCR2 )  is  only 
$  definea  on  the  NULL-PATH. 

if  DOMAIN  PCR1={NULL-PATH}  and  DOMAIN  PCR2 = {NULL-PATH }  then 

IS-'FORMAL(ELnBASE(V  n  )  ==  FALSE  ; 

IS-FORMAL(ELMBASE( V2) )  ==  FALSE  ; 
end  if  ; 

$  Nou  merge  the  bases  appearing  in  each  class  of  pseudo 

164 


$  creation  points. 

(V  CL  :=  PCRKCALL}  ) 

$  CALL  is  a  point-of -call . 

if  CALL  =  NULL^PATH  then 

$  For  pseudo  creation  points  in  the  routine 
$  containing  VI  and  V2,  merge  the  bases 
$  appearing  in  the  pseudo  creation  points  and 
$  the  bases  of  VI  and  V2. 

I1ERGE(  (CL  +  PCR2{NULL-PATH}  )  with  V2,  VI)  ; 
else 

$  For  pseudo  creation  points  uhich  are  in  the 

$  different  routine  from  VI,  merge  the  bases 

$  appearing  in  pseudo  creation  points 

$  according  to  their  points-of -call . 

$  Choose  an  element  having  the  same  gross  type 
$  as  VI  as  the  representative  of  its  class. 

if  3  OBjeCL  I  MGTYP ( 01 ) =MGTYP ( V 1 )  then 

MERGE(CL+PCR2 {CALL} ,  OBJ)  ; 
end  if  ; 

end  W  CL  ; 
end  if  IS-GLOB; 

return  ; 
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end  proc  MERGEOBJ  ; 


proc  PROPELMKVI,  CR1,  V2,  CR2  )  ; 

$  This  procedure  handles  set  insertion  and  membership 

$  operations.   VI  is  a  composite  object,  and  V2  must  be  an 

$  element  of  its  base.   We  generate  locate  instructions 

$  for  elements  of  CR2,  and  merge  the  elements  of  CR1  as  in 

$  the  previous  procedure. 

$  This  routine  is  called  by  the  routine  GENLOC  and  calls  the 
$  routines  MERGE,  PARTITION  and  INSERTLOCS. 

$  The  global  variables  referenced  by  this  routine  include 
$      IS-GLOB(V)   -  indicates  whether  V  is  a  global  variable 
$      IS-FORMALCB)  -  indicates  whether  B  is  a  formal  base 
$      MODE(OI)     -  mode  of  occurrence  01 

$  The  macros  used  in  this  routine  include 

$      MGTYP(OI)  -  gross  type  of  occurrence  01,  see  (M26). 

$      OI-NAMECOI)  -  name  of  occurrence  01,  see  (MS). 

$      COMPTYP(M)  -  element  mode  of  mode  descriptor  M,  see  ( M 1 9 ) 

$      ELMBASE(OI)  -  base  of  the  elements  of  occurrence  01, 

$  -  see  (n28) . 

$  The  local  variables  defined  in  this  routine  are 


repr 


VI, V2, OBJ, 01  :  eOI-BASE  ;   $  variable  occurrences 
CR1,CR2  :  set( I CRC-BASE, eoI-BASE  1  )  : 
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$  pseudo  creation  points 
VS1,VS2  :  set ( eoi-BASE)  ;   $  set  of  variable  occurrences 
CALL  :  eRC-BASE  ;         $  RC-string 
PCR1,PCR2  :  mmap(€RC-'BASE)set(  eoi-BASE)  ; 

$  maps  from  RC-strings  to  sets 
$  of  01  ;  generated  by  the 
$  routine  PARTITION 
CL  :  set ( eOI-BASE )  ;       $  set  of  variable  occurrences 
end  repr  ; 

$  Assign  V2  the  element  mode  of  VI. 

M0DE(V2)     ••  =    COriPTYP(MODE(  VI  )  )     ; 

if  IS^GL0B(0I-NAME(V1 ) )  or  IS-GLOB ( OI^NAME ( V2 ) )  then 

$  If  either  VI  or  V2  is  a  global  variable  then  turn 
S  off  IS-FORHAL  flags  for  the  bases  of  VI. 

IS-FORnAL(ELMBASE( VI ) )  ==  FALSE  ; 

$  Generate  locate  instructions  to  insert  the  elements  of 
$  CR2  into  the  base  of  VI. 

VS2     :=     {01,     I-,0I]€CR2}     ; 
INSERTL0CS(VS2,     ELMBASECVD)     ; 

$  Merge  the  base  of  VI  and  the  bases  of  the  occurrences 
$  in  CRT . 

VS1  :=  {01,  [-.OIlGCRll  ; 


167 


MERGE(VS1,  VI)   ; 


else 


$  Partition  CRl  and  CR2  into  equivalence  classes, 
$  according  to  points-of -call . 

PCR1  :=  PARTITIOKCCRl)  ; 
PCR2  :=  PARTITI0N(CR2)  ; 

$  Check  to  see  if  both  variables  are  very  local.   If  so, 
$  their  bases  are  not  formal.   Note  that  VI  (or  V2)  is 
$  very  local  if  and  only  if  PCR1  (or  PCR2)  is  only 
$  defined  on  the  NULL-PATH. 

if  DOMAIN  PCR1={NULL-PATH]  and  DOMAIN  PCR2 = {NULL-PATH}  then 

IS-FORMAL( ELMBASE( V 1 ) )  ==  FALSE  ; 

IS-'F0RMAL(ELMBASE(V2)  )  :=  FALSE  ; 
end  if  ; 

$  Merge  the  bases  and  generate  locate  instructions 
$  according  to  the  relevant  points-of-call . 

(VCL  :=  PCRUCALLl  ") 

if  CALL=NULL-PATH  then 

*  For  pseudo  creation  points  of  V2  uhich  are  in 

$  the  same  routine  as  VI,  generate  locate 

$  instructions  to  insert  the  created  values 

$  into  the  base  of  the  elements  of  VI. 

INSERTL0CS(PCR2{CALL} , ELMBASE ( V 1 ) )  ; 
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*  Merge  the  domain  bases  of  pseudo  creation 
$  points  of  VI  and  the  domain  base  of  VI. 

MERGECCL, VI )  ; 

else 

*  For  pseudo  creation  points  which  are  in  the 
$  different  routine  from  VI,  choose  an  element 
$  of  the  same  gross  type  as  VI  as  the 

$  representative  of  the  class,  and  perform 
$  locate  generation  and  base  merging. 

if  3  OBJ  €  CL  I  nGTYPC0BJ)=MGTYP(V1)  then 
INSERTL0CS(PCR2{CALL},  ELMBASE ( OB J  )  ) 
MERGECCL,  OBJ)  ; 
end  if  ; 
end  if  CALL  ; 


end  VCL  ; 
end  if  IS-GLOB; 

return  ; 

end  proc  PROPELMT  ; 


proc  PROPOFMAPCVI,  CR1,  V2.  CR2 )  ; 


$  This  procedure  processes  a  map  retrieval  operation 


$ 


Y  : =  F(X)  ; 
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$  VI  is  the  map  F,  and  V2  the  logical  index  X.   We  generate 

$  'locate'  instructions  to  insert  the  occurrences 

$  appearing  in  CR2  into  tlie    doamin  base  of  VI,  and  merge 

$  the  bases  of  all  occurrences  appearing  in  CRI.   The  code 

$  for  this  procedure  is  identical  to  that  for  PROPELMT, 

$  except  for  the  use  of  the  domain  base  of  VI,  instead  of 

$  the  element  base  which  appears  in  the  set  case. 

$  This  routine  is  called  by  the  routine  GENLOC  and  calls  the 

*  routines  MERGE,  PARTITION  and  INSERTLOCS. 

$  The  global  variables  referenced  by  this  routine  include 
$      nODE(OI)     -  mode  of  occurrence  01 

*  IS-GLOBCV)   -  indicates  whether  V  is  a  global  variable 

*  IS-FORMALC B )  -  indicates  whether  B  is  a  formal  base 

*  The  macros  used  in  this  routine  include 

$  nGTYP(OI)  -  gross  type  of  occurrence  01,  see  (n26). 

*  OI-NAME(OI)  -  name  of  occurrence  01,  see  (M8). 

*  DOMTYPCM)  -  domain  mode  of  mode  descriptor  M,  see  (M22) 

*  DOriBASE(OI)  -  domain  base  of  occurrence  01,  see  (n29). 

*  RANBASE(OI)  -  range  base  of  occurrence  01,  see  (ri30). 

$  The  local  variables  defined  in  this  routine  are 


repr 


VI, V2, OBJ, 01  :  GOI-BASE  ;   $  variable  occurrences 
CR1,CR2  -•  set(  I  eRC-BASE,  €OI-BASE  I)  ; 

$  pseudo  creation  points 
V51,VS2  :  set( eoi-BASE)  ;   $  set  of  variable  occurrences 
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CALL  :  GRC-BASE  ;  $  RC-string 

PCR1,PCR2  :  mmapCeRC-BASEisetCcoi-BASE)  ; 

$  maps  from  RC-strings  to  sets 
*  of  01  ;  generated  by  the 
$  routine  PARTITION 

CL  :  set(€Ol-BASE)  ;      $  set  of  variable  occurrences 


end  repr  ; 


$  Assign  V2  the  element  mode  of  the  domain  of  VI. 

r!0DE(V2)   :=  DOMTYP(MODE(  VI  )  )  ; 

if    IS-GLOBCOI-^NAMEtVD)     or    IS-GLOB  (  OI-NAME  (  V2  )  )     then 

*  If  either  VI  or  V2  is  a  global  variable  then  turn 

$  off  IS-FORMAL  flags  for  the  domain  base  and  the  range 

$  base  of  VI. 

IS-FORMAL(DOnBASE(Vm  :=  FALSE  ; 
IS-FORMAL(RANBASE( VI ) )  :=  FALSE  ; 

«  Generate  locate  instructions  to  insert  the  elements  of 
$  CR2  into  the  base  of  V 1 . 

VS2  :=  {01,  l-,0I]ecR2}  ; 
INSERTL0CS(VS2,  DOMBASE(VI))  ; 

$  Merge  the  base  of  VI  and  the  bases  of  the  occurrences 
$  in  CR  1  . 
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VS1   :=  {01,   [-,0I]€CR1}   ; 
MERGECVSI,  VI)  ; 


else 


$  Otherwise,  VI  and  V2  are  argument  variables  or 
$  variables  local  to  a  procedure. 

$  Partition  CRI  and  CR2  into  equivalence  classes, 
$  according  to  points-o±-call . 

PCR1  :=  PARTITIONCCRI )  ; 
PCR2  :=  PARTITI0N(CR2)  ; 

$  Check  to  see  if  both  variables  are  very  local.   If  so, 
$  their  bases  are  not  formal.   Note  that  VI  (or  V2)  is 
$  very  local  if  and  only  if  PCR1  (or  PCR2 )  is  only 
$  defined  on  the  NULL-PATH. 

if  DOMAIN  PCR1= {NULL-PATH}  and  DOMAIN  PCR2 = {NULL-PATH }  then 

IS-'FORMAL(DOnBASE(  VI  )  )  :=  FALSE  ; 

IS-FORMAL(RANBASE(  VI  )  )  ••  =  FALSE  ; 
end  if  ; 

$  Merge  the  bases  and  generate  locate  instructions 
$  according  to  points-of -call . 

(VCL  :=  PCRUCALLl) 

if  CALL=NULL-PATH  then 

$  For  pseudo  creation  points  of  V2  which  are  in 
$  the  same  routine  as  VI,  generate  locate 
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$  instructions  to  insert  them  into  the  domain 
$  base  of  VI. 

INSERTL0CS(PCR2{CALL} ,DOMBASE( VI ) )  ; 

$  Merge  the  domain  bases  of  pseudo  creation 
$  points  and  the  domain  base  of  VI. 

MERGECCL, VI )  ; 


else 


$  For  pseudo  creation  points  uhich  are  in  the 

$  different  routine  from  VI,  choose  an  element 

$  of  the  same  gross  type  as  VI  as  the 

$  representative  of  the  class,  and  perform 

$  locate  generation  and  base  merging. 

if  3  OBJ  €  CL  I  MGTYP(0BJ)=I1GTYP(  VI  )  then 
INSERTL0CS(PCR2{CALL}  ,  DOMBASE ( OB J  )  )  ; 
MERGECCL.* OBJ)  ; 
end  if  ; 
end  if  ; 


end  VCL  ; 
end  if  IS-GLOB  ; 

return  ; 

end  proc  PROPOFHAP  ; 


proc  PROPSOFMAP( VO,  CRO ,  VI,  CR1,  V2.  CR2)  ; 
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$  This  procedure  processes  the  instruction  = 

$      F(X)  : =  Y  ; 

$  VO,  VI  and  V2  correspond  to  F,  X  and  Y  respectively. 

$  As  before,  locate  instructions  into  the  base  of  VO  are 
$  generated  for  all  occurrences  in  CRI.   In  addition, 
$  locate  intructions  into  the  range  base  of  VO  are 
$  generated  for  all  objects  in  CR2 .   The  bases  of  the 
$  occurrences  appearing  in  CRO  are  also  merged  with  the 
$  base  of  VO. 

$  This  routine  is  called  by  the  routine  GENLOC  and  calls  the 
$  routines  MERGE,  PARTITION  and  INSERTLOCS. 

$  The  global  variables  referenced  by  this  routine  include 
$      IS-GLOBCV)   -  true  if  V  is  a  global  variable 
$      IS-FORMAL(B)  -  true  if  B  is  a  formal  base 

$  The  macros  used  in  this  routine  include 

$      nGTYP(OI)    -  gross  type  of  occurrence  01,  see  (M26). 

*  OI-NAME(OI)  -  name  of  occurrence  01,  see  (M8). 

$      COMPTYPCM)   -  element  mode  of  mode  descriptor  M ,  see  (M19) 

*  DOMTYP(tl)    -  domain  mode  of  mode  descriptor  M ,  see  (M22). 
$      RANTYP(M)    -  range  mode  of  mode  descriptor  M,  see  (M23). 

$  The  local  variables  defined  in  this  routine  are 

repr 

V0,V1,V2,0BJ  :  eoi-BASE   $  variable  occurrence 
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CR0,CR1,CR2  :  se t ( I  € RC- B ASE , e OI-B A SE  ]  )  ; 

$  pseudo  creation  points 

VS0,VS1,VS2  :  setCeoi-BASE)  ; 

$  set  of  variable  occurrences 

CALL  :  eRC-BASE  ;  $  RC-string 

PCR1,PCR2  :  mnap( eRC-BRSE)set( GOI-BASE)  ; 

$  maps  from  RC-strings  to  sets 
$  of  01  ;  generated  by  the 
$  routine  PARTITION 

CL  :  set ( eOI-BASE )  ;       $  set  of  variable  occurrences 
end  repr  ; 

$  Assign  X  and  Y  the  element  mode  of  the  domain  and  the 
S  element  mode  of  the  range  of  VO ,  respectively. 

M0DE(V1)  :=  DOMTYP(nODE(VO) )  ; 

M0DE(V2)  :=  RANTYPCnODEC VO) )  ; 

if  IS-GLOBCOI-NAMECVO) )  or  IS-GLOB ( OI-NAME ( V 1) )  then 

$  If  either  VO  or  VI  is  a  global  variable  then  turn 

$  off  IS-FORMAL  flags  of  the  domain  base  and  the  range 

$  base  of  VO. 

IS-FORMAL(DOMBASE( VO) )   :=  FALSE  ; 
IS-FORnAL(RANBASE(  VO)  )  ■•  =  FALSE  ; 


$  Generate  locate  instructions  to  insert  the  occurrences 
$  in  CRT  and  CR2  into  the  bases  of  the  domain  and  the 
$  base  of  the  range  of  VO,  respectively. 
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VS1  :=  {01,  (-,0Il€CR1}  ; 
INSERTLOCSC VS1 ,  DOMBflSE(VO))  ; 
VS2  :=  {01,  [-,OIl€CR^}  ; 
INSERTL0CS(VS2,  RANBASE(VO))  ; 

$  Merge  the  base  of  VO  and  the  bases  of  the  occurrences 

$  in  CRO. 

VSO  :=  {01,  [-,OI]€CRO}  ; 

MERGE(VSO,  VO)  ; 


else 


$  Otherwise,  VI  and  V2  are  argument  variables  or 
$  variables  local  to  a  procedure. 

$  Partition  CRO,  CR1  and  CR2  into  equivalence  classes, 
$  according  to  points-of -call . 


PCRO  :=  PARTITION(CRO) 
PCR1  :=  PARTITIONCCRI ) 
PCR2  :=  PARTITI0N(CR2) 


$  Check  to  see  if  both  variables  are  very  local.   If  so, 
S  their  bases  are  not  formal.   Uote  that  VI  (  or  V2  ) 
S  is  very  local  if  and  only  if  PCR1  (  or  PCR2  )  is  only 
i    defined  on  the  NULL-PATH. 


if  DOMAIN  PCRO= {HULL-PATH}  and  DOMAIN  PCR2 = {NULL-PATH  1  then 

IS-FORMAL(DOMBASE( VO) )  :=  FALSE  ; 

IS-FORMAL( RAHBASEC VO) )   :=  FALSE  ; 
end  if  ; 
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$  Merge  the  bases  and  generate  locate  instructions 
$  according  to  points-of -call . 

(VCL  :=  PCRO{CALL}) 

if  CALL=NULL-PflTH  then 

$  Generate  locate  instructions  for  the  pseudo 
$  creation  points  of  VI  and  V2  uhich  are  in 
$  the  same  routine  as  VO,  to  insert  them  into 
$  the  domain  base  and  the  range  base  of  VO. 

INSERTL0CS(PCR1 {CALL} ,DOnBASE(VO) )  ; 
INSERTL0CS(PCR2{CALLl ,RANBASE(VO) )  ; 

$  Merge  the  doimain  base  of  VO  with  the  domain 
*  bases  of  the  pseudo  creation  points  of  VO 
$  uhich  are  in  the  same  routine  as  VO. 

MERGE(CL,VO)  ; 

else 

$  For  pseudo  creation  points  uhich  are  in  the 

$  different  routine  from  VO,  choose  an  element 

$  of  the  same  gross  type  as  VO  as  the 

$  representative  of  the  class,  and  perform 

$  locate  generation  and  base  merging. 

if  3  OBJ  e  CL  |  MGTYP ( OB J ) =MGTYP ( VO )  then 
INSERTLOCSCPCRI (CALL) ,  DOMB ASE ( OB J ) )  ; 
INSERTL0CS(PCR2{CALL}  ,  RANB ASE ( OB J  )  )  ; 


177 


MERGE(CL,  OBJ)  ; 
end  if  ; 
end  if  ; 
end  VCL  ; 
end  if  IS-GLOB  ; 

return  ; 

end  proc  PROPSOFMAP  ; 


proc  PR0P0FTUP(V1 ,  CRI,  V2)  ; 


$  This  procedure  processes  a  tuple  retrieval  operation 


* 


Y  :=  T(I)  ; 


$  VI  corresponds  the  tuple  T,  and  V2  the  index.   We  merge 
$  the  bases  of  all  occurrences  in  CRI  and  VI. 

$  This  routine  is  called  by  the  routine  GENLOC. 

$  This  routine  calls  the  routines  MERGE,  PARTITION  and 
$  INSERTLOCS. 

$  The  global  variables  referenced  by  this  routine  include 
*      IS-GLOBCV)   -  true  if  V  is  a  global  variable 
$      IS-FORMALCB)  -  true  if  B  is  a  formal  base 


$  The  macros  used  in  this  routine  include 

$      MGTYr(OI)    -  gross  type  of  occurrence  01,  see  (MZG) 

$       OI-HAtlE(OI)  -  name  of  occurrence  01,  see  (HS). 
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$  ELMBASE(OI)  -  base  of  the  elements  of  occurrence  01, 

*  -  see  (ri28)  . 

$  COHBASECOI.n)   -  base  of  the  n-th  component  of  01,  see  (MSI 

$  GLTYP(OI)    -  length  of  occurrence  01,  see  (ri27). 


$  The  local  variables  defined  in  this  routin 


e  are 


repr 

V1,V2,0BJ  :  eOI-BASE      $  variable  occurrence 

CR1  :  set( [ GRC-BASE. GOI-BASE  1)  ; 

$  pseudo  creation  points 

VSI  :  set( eOI-BASE)  ;     $  set  of  variable  occurrences 

CALL  :  eRC-BASE  ;  $  RC-string 

PCR1  :  mmap(€RC-BASE)set(€OI-BASE)  ; 

$  maps  from  rc-^strings  to  sets 

$  of  01  ;  generated  by  the 

$  routine  PARTITION 

CL  :  set( eOI-BASE )  ;       $  set  of  variable  occurrences 
end  repr  ; 

case  MGTYP(VI)  of 


(THTUP) 


S  VI  is  a  tuple  of  unknown  length. 


if  IS-GLOB(OI-NAME( VI) )  then 

$  If  V1  is  a  global  variable  then  turn  off  IS-FORMAL 
$  flag  for  V 1  . 

IS-FORMAL(ELnBASE(vn )     :=     FALSE     ; 
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$  Merge  the  base  of  VI  and  the  bases  of  the  occurrences 
$  in  CRl. 

VS1  :=  {01,  1-,0I]€CR1}  ; 
MERGECVSI,  VI)  ; 


else 


$  Partition  CRl  into  equivalence  classes,  according  to 
$  points-of -call . 

PCR1  :=  PARTITI0N(CR1 )  ; 

$  If  VI  is  a  local  variable,  its  base  is  not  formal. 

if  DOHAIN  PCR1  =  {NULL-PATH}  then 

IS-FORMALCELMBASEC VI ) )  :=  FALSE  ; 
end  if  ; 

$  Merge  bases  according  to  points-of -call . 

(VCL  :=  PCRHCALL}  )' 

if  CALL=NULL-PATH  then 

$  For  the  pseudo  creation  points  of  VI  which  are 
$  in  the  same  routine  as  VI,  merge  their  bases 
$  with  the  base  of  VI. 

MERGE(CL,V1)  ; 


elne 


$  For  pseudo  creation  points  which  are  in  the 
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$  different  routine  from  VI,  choose  an  element 

$  of  the  same  gross  type  as  VI  as  the 

$  representative  of  the  class,  and  perform 

$  base  merging . 

if  3  OBJ  €  CL  I  MGTYP(OBJ)=nGTYP( VI )  then 

MERGECCL,  OBJ)  ; 
end  if  ; 
end  if  ; 
end  VCL  ; 
end  if  IS-GLOB; 


(TMTUP)  : 


$  VI  is  a  tuple  of  knoun  length, 


if  IS-GL0B(0I-NAnE(V1 ) )  then 
(VIX  :=  1 . . .GLTYPCVI ) ) 

$  If  VI  is  a  global  variable,  the  bases  of  its 
$  components  are  not  formal. 

IS-FORMAL(COMBASE( V 1 ,IX) )  :=  FALSE  ; 
end  V  ; 

$  Merge  the  bases  of  the  components  of  VI  and  the  bases 
$  of  the  occurrences  in  CR1. 

VS1  :=  {01,  [-,0IlecR1}  ; 
MERGEC VS  1  ,  VI  )   ; 

else 


*  Partition  CR1  into  equivalence  classes,  according  to 
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$  points-of -call . 

PCRT  ••  =  PARTITI0N(CR1  )  ; 

$  If  VI  is  a  local  variable,  its  base  is  not  formal. 

if  DOMAIN  PCR1  =  INULL-PATH}  then 

(VIX  ••=  1  .  .  .GLTYPC  VI  )  )  IS-FORMAL(COMBASE(  VI  ,  IX))  :=  NO; 
end  if  ; 

$  Merge  bases  according  to  points-of -call . 

(VCL  :=  PCRUCALL}  ) 

if  CALL=NULL-PATH  then 

$  For  the  pseudo  creation  points  of  V2  which  are 
$  in  the  same  routine  as  VI,  merge  their  bases 
$  with  the  base  of  VI. 

MERGE(CL,V1)  ; 

else 

$  For  pseudo  creation  points  which  are  in  the 

$  different  routine  from  VI,  choose  an  element 

$  of  the  same  gross  type  as  VI  as  the 

$  representative  of  the  class,  and  perform 

$  base  merging. 

if  3  OBJ  €  CL  I  MGTYP(OBJ)=MGTYP(V 1 )  then 

MERGECCL,  OBJ)   ; 
end  if  ; 
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end  if  ; 
end  VCL  ; 

end  if  IS-GLOB  ; 

end  case  ; 

return  ; 

end  proc  PP.OPOFTUP  ; 

proc  PROPSOFTUP( VO,  CRO,  VI,  V2,  CR2)  ; 

$  This  procedure  handles  tuple  assignments.   If  a  tuple  is 

$  homogeneous,  the  process  used  is  identical  to  that  for  set 

$  insertion,  and  the  procedure  PROPELMT  is  invoked. 

$  Otherwise,  the  integer  value  of  the  index  VI  is  known, 

$  and  the  corresponding  base  of  VO  must  be  used  to  locate 

$  occurences  in  CR2 . 

$  This  routine  is  called  by  the  routine  GENLOC. 

$  This  routine  calls  the  routines  MERGE,  PROPELMT,  PARTITION 
$  and  INSERTLOCS. 

$  The  global  variables  referenced  by  this  routine  include 

*      MODE(OI)     -  mode  of  occurrence  01 

$      IS-GLOB(V)   -  true  if  V  is  a  global  variable 

$      IS-FORMALCB)  -  true  if  B  is  a  formal  base 

$  The  macros  used  in  this  routine  include 

^      MGTYP(OI)    -  gross  type  of  occurrence  01,  see  (M26). 
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*  OI-NMIE(OI)  -  name  of  occurrence  01,  see  (MS). 

*  OI-VALUE(OI)  -  value  of  occurrence  01,  see  (M9). 

$  CTYPHCM,!)   -  I-th  component  mode  of  mode  descriptor  M,  see 

$  -  (M20). 

$  COMBASE(OI,N)   -  base  of  the  H-th  component  of  01,  see  (M31 

*  GLTYP(OI)    -  length  of  occurrence  01,  see  (M27). 

$  COriPTYP(M)   -  element  mode  of  mode  descriptor  M,  see  (M19). 

*  RAKTYP(M)    -  range  mode  of  mode  descriptor  M,  see  (M23). 

$  The  local  variables  defined  in  this  routine  are 

repr 

V0,V1,V2,0BJ  :  GQI-BASE   $  variable  occurrence 

CR0,CR2  :  set( [ eRC-BASE.eoi-BASE 1 )  ; 

$  pseudo  creation  points 

VS0,VS2  :  set ( €0I-BASE)  ;   5  set  of  variable  occurrences 

CALL  :  eRC-BASE  ;  $  RC-string 

PCR0,PCR2  :  mmap( eRC-BASE)set( coi-BASE)  ; 

$  maps  from  RC-strings  to  sets 
$  of  01  ;  generated  by  the 
$  routine  PARTITION 

CL  :  set( eOI-BASE)  ;      $  set  of  variable  occurrences 
end  repr  ; 

$  If  VO  is  a  tuple  of  unknown  length,  invoke  the  routine 
$  PROPELMT  and  return, 

if  MGTYPC VO )=THTUP  then  PROPELMT ( CO , CRO , V2 , CR2 )  ;  return  ;  end; 
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INDEX  :=  OI-VALUEC VI )  ; 


$  value  of  VI 


$  Assign  V2  the  mode  of  the  Vl-th  component  of  VO. 

M0DE(V2)  :=  CTYPNCMODECVO) ,  INDEX)  ; 

if  IS-GLOBCOI-NAnECVO) )  then 
( VIX  : =  1  .  .  .GLTYPC VO) ) 

*  If  VO  is  a  global  variable,  the  bases  of  its 
$  components  are  not  formal. 

IS-FORNAL(C0MBASE(VO,IX) )  :=  FALSE  ; 
end  V  ; 

$  Generate  locate  instructions  to  insert  the  occurrences 
$  in  CR2  into  the  proper  base  of  VO. 

VS2  :=  {01,   [-,0I]€CR2}   ; 

INSERTLOCSC VS2 ,  COMBASECVO,  INDEX))  ; 

$  Herge  the  base  of  VO  and  the  bases  of  the  occurrences 
$  in  CRO. 

* 

VSO  :=  {01,  [-,OIj€CRO)  ; 
NERGE(VSO,  VO)  ; 

else 

*  Partition  CRO  and  CR2  into  equivalence  classes, 
$  according  to  points-of -call  . 

PCRO  :=  PARTITION(CRO)  ; 


185 


PCR2  :=  PARTITI0N(CR2)  ; 

$  Check  to  see  if  both  variables  are  very  local.   If  so, 
$  their  bases  are  not  formal. 

if  DOMAIN  PCRO  =  {NULL-PATH}  then 

(VIX:  =  1  .  .  .GLTYP(VO)  )  IS-FORMAL ( COHB ASE ( VO ,  IX))  :=  FALSE;; 

I 

end  if  ; 

$  Merge  bases  and  generate  locate  instructions  according 
$  to  points-of -call . 

(VCL  :=  PCROCCALL}) 

if  CALL=NULL-PATH  then 

$  For  the  pseudo  creation  points  of  V2  uhich  are 
$  in  the  same  routine  as  VO,  generate  locate 
$  instructions  to  insert  them  into  the  base  of 
$  the  components  of  VO. 

INSERTL0CS(PCR2{CALL} ,COMBASE(VO,  INDEX))  ; 

$  Merge  the  domain  bases  of  pseudo  creation 
$  points  and  the  domain  base  of  VO. 

MERGE(CL,VO)  ; 

else 


$  For  pseudo  creation  points  uhich  are  in  the 
$  different  routine  from  VO,  choose  an  element 
S  of  the  same  gross  type  as  VO  as  the 
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$  representative  of  the  class,  and  perform 
$  locate  generation  and  base  merging. 

if  3  OBJ  €  CL  I  HGTYPCOBJ )=MGTYP( V1 )  then 

INSERTL0CS(PCR2{CALL} ,  COMBASECOBJ,  INDEX))  ; 
MERGECCL,  OBJ)  ; 
end  if  ; 
end  if  ; 
end  V  CL  ; 

end  if  IS-GLOB  ; 

return  ; 

end  proc  PROPSOFTUP  ; 


proc  PROPSOFAMAPCVO,  CRO,  VI,  CRI,  V2 ,  CR2 )  ; 


*  This  procedure  handles  the  instruction 


$ 


F{X}  :=  S  ; 


$  Tuo  cases  arise  : 

$  A)  S  is  of  type  'set'  Then  it  is  a  subset  of  the  range 
S  of  F,  and  its  base  must  be  merged  with  the  range  base  of 
$  F.  In  this  case,  X  is  handled  in  tlie  same  fashion  as  in 
$  the  single-valued  storage  case. 

S  B)  S  is  of  type  'map'.   This  will  be  the  case  uhen  F  is 
$  actually  a  function  of  tuo  variables,  whose  mode  will 
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$  emerge  as 


$ 


mmap{€B}map(€B2  )* 


i    The  range  base  of  F  therefore  contains  maps,  and  S  is  an 
$  element  of  it  (instead  of  being  a  subset,  as  in  the 
$  preceding  case).   S  must  therefore  be  'located'  in  the 
$  range  base  of  F.   The  instruction  is  treated  in  much  the 
$  same  uay  as  a  single-valued  storage  operation. 

$  This  routine  is  called  by  the  routine  GENLOC. 

4  This  routine  calls  the  routines  MERGE,  EgUIV,  PROPSOFMAP, 
$  PARTITION  and  IHSERTLOCS. 

$  The  global  variables  referenced  by  this  routine  include 
$      MODE(OI)     -  mode  of  occurrence  01 

*  IS-GLOBtV)   -  true  if  V  is  a  global  variable 
$      IS-FORMALCB)  -  true  if  B  is  a  formal  base 

$  The  macros  used  in  this  routine  include 

$      MGTYPCOI)    -  gross  type  of  occurrence  01,  see  (n26). 

*  OI-NAME(OI)  -  name  of  occurrence  01,  see  (N8). 

$      ELMBASECOI)  -  base  of  the  elements  of  occurrence  01, 

*  -  see  (M28) . 

*  DOMBASE(OI)  -  domain  base  of  occurrence  01,  see  (M29). 
$      RAHBASE(OI)  -  range  base  of  occurrence  01,  see  (M30). 

$  COnPTYP(M)  -  element  mode  of  mode  descriptor  M,  see  (M19) 
$  DOnTYP(M)  -  domain  mode  of  mode  descriptor  M.  see  (n22). 
«      RANTYP(M)    -  range  mode  of  mode  descriptor  M,  see  (M23). 
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S  The  local  variables  defined  in  this  routine  are 

repr 

V0,V1,V2,0BJ  :  eOI-^BASE   $  variable  occurrence 

VS0,VS1,VS2  :  se t ( € 01- B AS E )  ; 

$  set  of  variable  occurrences 

CR0,CR1,CR2  :  se t (  ( € RC- B ASE , e 01- B ASE  1)  ; 

$  psseudo  creation  points 

CALL  :  eRC-BASE  ;  $  RC-string 

PCRCPCRI  ,PCR2     :     mmap  (  eRC-B  ASE  )  se  t  (  e  01- B  ASE  )     ; 

$  maps  from  rc-strings  to  sets 
$  of  01  ;  generated  by  the 
*  routine  PARTITION 

CL  :  set(€OI-BASE)  ;      $  set  of  variable  occurrences 
end  repr  ; 

$  If  V2  is  a  map,  call  the  routine  PROPSOFHAP  and  return. 

if  MGTYP(V2)=TriAP  then 

PROPSOFMAPCCO,  CRO,  VI,  CR1,  V2,  CR2 )  ; 

return  ; 
end  if  ; 

$  Otherwise,  assign  VI  the  mode  of  the  elements  of  the 
$  domain  of  VO . 

MODECVI)  :=  D0t1TYP(M0DE(  VO)  )  ; 


$  Equivalence  the  range  base  of  VO  and  the  domain  base  of 
$  V2  . 
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EeUIV(RANBASE( VO) ,  ELMB ASE ( V2 ) )  ; 

i±  IS-GLOBtOI-NAMECVO) )  then 

$  If  VO  is  a  global  variable,  its  domain  base  and  range 
$  base  are  not  formal. 

IS-FORnAL(DOnBASE( VO) )   ==  FALSE  ; 
IS-FORnAL(RANBASE( VO) )  :=  FALSE  ; 

$  Generate  locate  instructions  to  insert  the  occurrences 
$  in  CR1  into  the  domain  base  of  VO. 

VS1  :=  {01,  (-,0I]€CR1}  ; 
INSERTL0CS(VS1 ,  DOMBASE(VO))  ; 

$  Merge  the  base  of  VO  and  the  bases  of  the  occurrences 
$  in  CRO. 

VSO  :=  {01,  [-,OI]€CRO}   ; 

MERGECVSO,  VO)  ; 

*  Merge  the  base  of  V2  and  the  bases  of  the  occurrences 
$  in  CR2. 

V52  :=  {01,  [-,0IleCR2}  ; 
MERGE(VS2,  V2)  ; 

else 


$  Partition  CRO,  CR1  and  CR2  into  equivalence  classes, 
$  according  to  points-of -call . 

190 


PCRO  :=  PARTITION(CRO) 
PCR1  :=  PARTITIONCCRI ) 
PCR2  :=  PARTITIONC CR2 ) 

*  Clieck  to  see  if  both  variables  are  very  local.   If  so, 
$  their  bases  are  not  formal. 

if  DOMAIN  PCRO={NULL-'PATH}  and  DOMAIN  PCR2  =  { NULL--PATH }  then 

IS-FORMAL(DOMBASE( VO) )  :=  FALSE  ; 

IS-FORriAL(RANBASE(  VO)  )  :=     FALSE     ; 
end    if    ; 

$  Merge  the  bases  and  generate  locate  instructions 
$  according  to  points-of -call . 

(VCL  :=  PCROCCALL}) 

if    CALL  =  NULL-PATH    tlien 

INSERTLOCSCPCRI {CALL} ,  DOMBASE(VO))  ; 
MERGE(CL,  VO)  ; 
MERGE(PCR2{CALL} ,  V2)  ; 
else 

INSERTLOCSCPCRI {CALL} ,  DOMB ASE ( OB JO ) )  ; 
if  30BJ0  I  MGTYP(OBJO)=MGTYP(VO)  then 

MERGECCL,  OBJO)  ; 
end  if  ; 
if  30BJ2  I  MGTYP(OBJ2)=MGTYP(V2)  then 

MERGE(PCR2{CALL} ,  0BJ2)  ; 
end  if  ; 
end  if  ; 
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end  V  CL; 
end  if  IS-GLOB  ; 

return  ; 

end  proc  PROPSOFAMAP  ; 


proc  MERGECVARSET,  OBJ)  ; 

$  This  procedure  equivalences  the  bases  of  the  objects  in 

*  VARSET,  uhich  have  the  same  gross  type  as  OBJ,  to  the 
$  corresponding  bases  of  OBJ.   OBJ  is  aluays  a  composite 
$  object.   The  bases  of  the  objects  in  VARSET  uhich  have 
$  different  gross  type  from  OBJ  are  not  equivalenced  with 
$  the  base  of  OBJ.   If  OBJ  is  a  set  or  a  tuple  of  known 

$  length,  a  single  base  from  each  object  is  involved.   If  it 

*  is  a  map,  then  domain  and  range  bases  are  equivalenced 

$  seperately.   For  known  length  mixed  tuples,  one  base  per 
$  component  is  involved. 

$  This  routine  is  called  by  the  routines  MERGEOBJ,  PROPELMT, 
$  PROPOFMAP,  PROPOFTUP,  PROPSOFHAP,  PROPSOFTUP  and 
$  PROPSOFAMAP. 

$  This  routine  calls  the  routine  EBUIV. 

$  The  macros  used  in  this  routine  include 

*  MGTYP(OI)    -  gross  type  of  occurrence  01,  see  (M26). 
$      ELMBASE(OI)  -  base  of  the  elements  of  occurrence  01, 

*  -  see  (n28) . 
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$  DOnBASE(OI)  -  domain  base  of  occurrence  01,  see  (1129). 

*  RANBASE(OI)  -  range  base  of  occurrence  01,  see  (MBO). 

*  COMBASE(OI,N)  -  base  of  the  N-th  component  of  01.  see  (n31 
$  GLTYP(Oi;    -  length  of  occurrence  01,  see  (n27). 

$  The  local  variables  defined  in  this  routine  are 

repr 

CRSET  :  set( ( eRC-BASE, eoi-BASE ] )  ; 

$  pseudo  creation  points 
VARSET  :  se t ( e 01 - B ASE )  ;    $  set  of  variable  occurrences 
01, OBJ, IV  :  €OI-BASE  ;       $  variable  occurrence 

end  repr  ; 

$  For  sets  and  tuples  of  unknown  length  • 

if  MGTYPCOBJ)  e  {TSET,  THTUP}  then 
(Vol  €  VARSET) 

$  If  01  and  OBJ  are  of  the  same  type,  equivalence 
$  the  element  bases  of  01  and  OBJ. 

if  MGTYP(OI)=nGTYP(OBJ)  then 

EeUIV( ELMBASE(OI) .  ELMB ASE ( OB J ) )  ; 

S  If  01  is  an  ovariable  uhich  receives  a 

$  retrieved  value  form  IV,  merge  the  mode  of 

$  01  with  the  element  mode  of  the  base  of  IV. 


if  IS-OVAR(OI)  and  OP-CODE(OI)  in  OPS-RETRIEVE 
and  MGTYPdV  :  =IFROri(OI,  1  )  )  in 
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{TSET,TnAP,THTUP,TnTUP)  then 
MERGE-INTO(OI,IV)  ; 
end  if  IS-OVAR  ; 


end  if  MGTYP  ; 
end  V  ; 

$  For  maps  : 

elseif  nGTYP(OBJ)=TMAP  then 
(VOI  e  VARSET) 

if  MGTYP(OI)=TnAP  then 


$  Equivalence  the  domain  bases  and  the  range 
$  bases  of  01  and  OBJ,  respectively. 

EeUIV(DOnBASE(OI) ,  DOMBASECOBJ) )  ; 
EeUIV(RANBASE(OI) ,  RANB AS E ( OB J ) )   ; 

$  If  01  is  an  ©variable  which  receives  a 

$  retrieved  value  form  IV,  merge  the  mode  of 

$  01  with  the  element  mode  of  the  base  of  IV. 

if  IS-OVAR(OI)  and  OP-CODE(OI)  in  OPS-RETRIEVE 
and  MGTYPdV:  =IFROM(OI,  n  )  in 

{TSET.TMAP,THTUP,TnTUPl  then 
MERGE-IHTO(OI,IV)  ; 
end  if  IS-OVAR  ; 


end  if   MGTYP  ; 
end  W  ; 
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$  For  tuples  of  known  length  • 

elseif  MGTYPCOBJ)  =  TMTUP  then 

CVOI  €  VARSET) 

if  MGTYP(OI)=TMTUP  then 


$  Equivalence  the  bases  of  each  corresponding 
$  component  of  01  and  OBJ. 

(V  IX  :=  1 . . .GLTYP(OBJ) ) 

£eUIV(COnBASE(OI,  IX),  COnBASECOBJ,  IX))  ; 
end  V  ; 

$  If  01  is  an  ovariable  which  receives  a 

$  retrieved  value  from  IV,  merge  the  mode  of 

$  01  with  the  element  mode  of  the  base  of  IV. 

if  IS-OVARCOI)  and  OP-CODE(OI)  in  OPS-RETRIEVE 
and  MGTYPCIV: =IFROM(OI, 1 ) )  in 

{TSET,TMAP,THTUP, TMTUP}  then 
MERGE-INTO(OI,IV)  , 
end  if  IS-OVAR  ; 


end  if  MGTYP  ; 
end  V  ; 

end  if  nCTYP  ; 

end  proc  MERGE  ; 
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proc  nERGE-INTO(OV,IV)  ; 

$  This  routine  merges  the  mode  of  OV  with  the  el'ement  mode 
$  of  the  base  of  IV.   This  routine  is  called  uhen  an 
$  occurrence  OV  receiving  a  retrieved  value  from  IV  is 
$  determined  to  be  domain  based  and  hence  the  elements  of 
$  the  base  of  IV  must  be  given  the  same  mode  as  OV. 

$  This  routine  is  called  by  the  routines  MERGE  and  calls  the 
$  routine  MODEDIS. 

$  The  global  variables  referenced  by  this  routine  include 

*  BtlODE(B)     -  mode  of  base  B 

$      MODE(OI)     -  mode  of  occurrence  01 

$  The  macros  used  in  this  routine  include 

*  MGTYPCOI)    -  gross  type  of  occurrence  01,  see  (M26). 

$      C0nBASE(0I,N)   -  base  of  the  N-th  component  of  01,  see  (ri3 

*  GLTYP(OI)    -  length  of  occurrence  01,  see  (M27). 

$      COnPTYP(M)   -  element  mode  of  mode  descriptor  M,  see  (n19) 

*  OI-OP(OI)    -  opcode  of  the  instruction  containing  01, 
$  -  see  .(M7)  . 

$  The  local  variables  defined  in  this  routine  are 


repr 

OV.IV  :  eoi-BASE  ; 
end  repr  ; 


$  variable  occurrence 


case  OI-OP(OV)  of 


196 


(el-ARB,  el-FROM,  ei-NEXT,  el-INEXT)  : 

$  If  this  is  an  extraction  from  or  iteration  over  a  set, 
*  merge  the  mode  of  OV  with  the  element  mode  of  the 
$  base  of  IV. 

if  riGTYP(IV)  =  TSET  then 

COMPTYP(BMODE(ELMBASE(IV) ) )  MODEDIS .  MODE(OV)  ; 
end  if  ; 

(el-NEXTD,  ei-INEXTD)  : 

$  If  this  is  an  iteration  over  a  map,  merge  the  mode  of 
$  OV  uith  the  element  mode  of  the  range  base  of  IV. 

if  tlGTYPdV)  =  TNAP  then 

COnPTYP(BnODE(RANBASE(IV) ) )  MODEDIS .  MODE(OV)  ; 
end  if  ; 

(el-OF)  : 

case  MGTYP(IV)  of 

(THTUP)   : 

$  If  this  is  a  value  retrieval  from  a  tuple  of 
$  unknoun  length,  merge  the  mode  of  OV  uith  the 
$  element  mode  of  the  base  of  IV. 

COMPTYPCBMODECELMBASECIV) ) )  MODEDIS.  MODE(OV)  ; 
(TMAP)  •• 
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$  If  this  is  a  value  retrieval  from  a  map,  merge  the 
$  mode  of  OV  uith  the  element  mode  of  the  range 
$  base  of  IV. 

COriPTYP(EriODE(RANBASE(IV)  )  )  MODEDIS.  MODE(OV)  ; 


(THTUP)  : 

$  If  this  is  a  value  retrieval  from  a  tuple  of  knoun 
$  length,  merge  the  mode  of  OV  with  the  element 
$  mode  of  the  proper  component  base  of  IV. 

IX  :=  OI-VALUE(IFROMO(OV, 2) )  ; 
COMPTYP(BnODE(COMBASE(IV,IX) ) )  MODEDIS.  MODE(OV)  ; 

end  case  MGTYP  ; 

(ei-OFB)  : 

$  If  this  is  a  range  retrival  from  a  map,  merge  the  mode 
$  of  OV  uith  the  range  mode  of  IV. 

if  MGTYPdV)  =  TMAP  then 

COMPTYP(BMODE(RANBASE(IV) ) )  MODEDIS.  COMPTYP ( MODE ( OV ) )  ; 
end  if  ; 
end  case  OI-OP  ; 

return  ; 

end  proc  MERGE-INTO  ; 


proc  EQUIVCBI,  B2)  ; 
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$  This  routine  equivalences  the  bases  B1  and  B2  by  using  the 
$  compressed  balanced  tree  technique.   Equivalence  classes 

*  are  represented  by  a  forest  of  trees.   The  root  of  a 

$  tree  is  the  representative  (called  the  real  base  of  the 

$  tree)  of  all  the  bases  in  the  tree.   Trees  are 

$  structured  by  the  map  PARENT  ;  PARENT(B)  points  to  the 

$  preceding  node  of  B  in  the  tree  if  B  is  not  a  root, 

$  otherwise  PARENT(B)  is  undefined. 

$  When  ue  equivalence  tuo  bases  Bl  and  B2,  ue  first  find  the 

$  roots  R1  and  R2  of  the  trees  containing  Bl  and  32 

$  respectively.   If  Rl  and  R2  are  the  same  (i.e.,  Bl  and 

$  B2  are  in  the  same  tree),  nothing  has  to  be  done. 

$  Otheruise,  ue  link  that  one  of  R1  or  R2,  uhose  tree  has 

$  fewer  nodes,  to  the  other  as  a  subtree. 

$  Four  global  maps  are  defined  on  the  roots  of  trees  to 
$  describe  the  properties  of  the  corresponding  real  bases. 

$  IS-FORMAL(B)  -  indicates  whether  B  is  a  formal  base 

$  NBASES(B)    -  number  of  bases  in  the  same  class  of  B 

$  NBASEDON(B)  -  number  of  the  sets  and  maps  based  on  B 

$  BnODE(B)     -  mode  of  base  B 

*  PARENT  is  another  global  map  defined  on  bases  to  point  to 
$  the  preceding  nodes  in  the  tree. 

$  This  routine  is  called  by  the  routine  MERGE  and 

$  INSERTLOCS.   It  calls  the  routine  REALB  and  MODEDIS. 
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$  The  local  variables  defined  in  this  routine  are 

REPR 

B1 ,B2 ,RB1  ,RB2  :  eSB-BASES  ;    *  bases 
end  repr  ; 

$  If  the  roots  R1  and  R2  of  the  trees  <^ontaining  Bl  and  32 
$  are  tlie  same,  then  return. 

if  (RBI : =RERLB(B1 ) )  =  ( RB2 : = RE ALB ( B2 ) )  then 
return  ; 

else 

if  NBASES(RBI)  <  NBASES(RB2)  then 

$  If  the  tree  containing  RBI  has  fewer  nodes  than 
$  the  tree  containing  RB2  then  link  RBI  as  a 
$  substree  of  RB2. 

PARENT(RBI)  :=  RB2  ; 

$  Update  NBASES  and  NBASEDON  of  the  new  tree. 

NBASES(RB2)  +  NBASES(RBl)  ; 
NBASEDONCRBZ)  +  NB ASEDON ( RB  1  )  ; 

$  The  new  tree  is  a  formal  one  if  the  original  trees 
$  of  RBI  and  RB2  both  are  formal. 

IS-FORnALCRBZ)  :=  IS-FORM AL ( RB 2 )  and  IS-FORMAL ( RB  1  )  ; 

*  The  mode  of  the  new  tree  is  the  disjunction  of  the 
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$  modes  of  the  original  tr 


ees  . 


BnODE(RB2)  :=  BMODECRBZ)  MODEDIS .  BMODE(RBI)  ; 


else 


$  If  the  tree  containing  RB2  has  fewer  nodes  than 
$  the  tree  containing  RBI  then  link  RB2  as  a 
$  substree  of  RBI. 

PARENT(RB2)  : =  RBI  ; 

$  Update  KBASES  and  NBflSEDON  of  the  new  tree. 

NBASES(RBI)  +  NBASES(RB2)  ; 
NBASEDON(RBI)  +  NBASEDON ( RB2 )  ; 

$  The  neu  tree  is  a  formal  one  if  the  original  trees 
$  of  RB2  and  RBI  both  are  formal. 

IS-FORMALC RBI )   ==  IS- FORM AL ( RB 1 )  and  IS-FORMAL ( RB2 )  ; 

$  The  mode  of  the  neu  tree  is  the  disjunction  of  the 
$  modes  of  the  original  trees, 

BMODE(RBI)  :=  BMODECRBD  nODEDIS  .  BriODE(RB2)  ; 


end  if  NBASES  ; 
end  if  (RBI  ; 

return  ; 

end  proc  ECUIV  ; 
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proc  REALB(B)  ; 

$  This  routine  finds  the  real  base  of  B.   This  is  achieved 
$  by  following  the  map  PARENT  until  ue  reach  a  node  of 
$  which  PARENT  is  undefined.   As  a  side  effect,  the  tree 
$  is  compressed.   All  the  nodes  R1  on  the  path  from  B  to 
$  the  root  R2 ,  except  the  node  which  is  an  immediate 
$  descendant  of  the  root,  are  re-linked  to  the  root  as 

*  immediate  descendants,  i.e.,  PARENT(RI)  is  set  to  point  to 
$  R2.   This  compression  procedure  reduces  the  depth  of  the 

$  tree  and  makes  subsequent  root  finding  procedures  more 
$  efficient. 

$  This  routine  is  called  by  the  routine  E2UIV. 

$  The  only  global  variable  referenced  by  this  routine  is 

*  PARENT(B)    -  preceding  node  of  B  in  the  tree 

$  The  local  variables  defined  in  this  routine  are 

repr 

B,R1,R2,R3  :  €SB-BASE  ;       $  bases 
WORK  :  set( esB-BASE)  ;       $  uorkpile  of  bases 
end  repr  ; 

$  Initialize  uorkpile. 
WORK  : =  nl  ; 

$  Let  R1  point  to  B. 
R1  :=  B  ; 
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$  If  B  is  a  root,  then  return 
$  the  preceding  node  of  R1. 

if  (R2  ••  =PARENT(R1  )  )  =  OH  then 

return  ; 
end  if  ; 


Otherwise,  let  R2  point  to 


$  While  R2  is  not  a  root,  insert  R1  into  WORK  and  let  R2 
$  point  to  the  preceding  node  of  R1, 

(while  R3-- =PflRENT(R2)  /=  OM) 

WORK  with  R1  ; 

R1  :=  R2  ; 

R2  :=  R3  ; 
end  while  ; 

$  Compress  the  tree  by  re-linking  the  nodes  in  WORK  to  R2 

(V  R1  €  WORK) 

PARENT(RI)  :=  R2  ; 
end  V  ; 

return  R2  ; 

end  proc  REALB  ; 


proc  INSERTLOCS(VARSET,  BASE)  ; 

$  This  procedure  ensures  the  variable  occurrences  appearing 
$  in  VARSET  to  be  elements  of  the  base  BASE  by  either 
$  merging  the  bases  of  the  occurrences  appearing  in  VARSET 
$  wxth  BASE  or  generating  locate  instructions  to  insert 
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$  the  occurrences  in  VARSET  into  BASE.   If  an  occurrence  X 
$  in  VARSET  is  the  ©variable  of  a  value  retrieval 
$  instruction  of  uhich  the  composite  object,  from  which  the 
$  value  is  retrieved,  has  been  domain  based  on  a  base  B, 
$  then  B  is  equivalenced  with  BASE.   If  an  occurrence  X  in 
$  VARSET  has  already  received  a  'locate'  instruction  to 
$  insert  its  value  into  a  base  B,  then  B  is  equivalenced 
$  uith  BASE  in  the  same  uay  as  in  the  previous  case.   For 
$  other  occurrences  in  VARSET,  locate  instructions  are 
$  generated  to  insert  their  values  into  BASE.   Generated 
$  locate  instrcuctions  are  collected  into  the  global  map 
$  LOGINS.   LOGINS  maps  each  occurrence  to  the  bases  into 
$  which  the  occurrence  is  to  be  inserted. 

$  This  routine  is  called  by  the  routines  NERGEOBJ,  PROPELMT, 
$  PROPOFMAP,  PROPOFTUP,  PROPSOFMAP,  PROPSOFTUP  and 
$  PROPSOFAMAP. 

$  This  routine  calls  the  routine  EeuiV. 

$  The  only  global  variable  referenced  by  this  routine  is 
$      LOCINS(OI)   -  the  base  into  which  01  is  inserted 

$  The  local  variables  used  in  this  routine  are 


repr 

VARSET 
01, OBJ 

end  repr  ; 


set(  eOI-'BASE)  ;   $  set  of  variable  occurrences 
eoi-'BASE  ;        $  variable  occurrence 
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(Vol  e  VARSET) 

if  IS-OVARCOI)  and  OP-CODE(OI)  in  OPS-RETRIEVE 

and  MGTYPCIV :=IFROM(OI, 1 ) )  in  {TSET , TMAP , THTUP , XnXUP }  the 

$  If  01  is  the  ovariable  of  a  value  retrieval 

$  instruction,  equivalence  BASE  uith  the  base  of 

$  the  ivariable. 

case  OI-OPCOI)  of 

(el-ARB,  2l-FR0t1,  ei-'NEXT,  SI-INEST)  : 

if    nCTYPdV)     =    TSET    then 

EeUIV(BASE,ELMBASE(IV) )     ; 
end    if    ; 

(el-NEXTD,     el-INEXTD)      = 

if    riGTYP(IV)     =    TMAP    then 

ESUIVCBASE, DOMBASE(IV) )     ; 
end    if    ; 

(el-OF)     : 

case  MGTYPCIV)  of 

(THTUP)  : 

E2UIV(BASE, ELMBASE(IV) )  ; 

(TMAP)  : 

E2UIV(BASE,RANBASE(IV) )  ; 
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(TMTUP)  : 

IX  :=  OI-VALUE(IFROnO(OI, 2 ) )  ; 
EeUIV(BASE,COriBASE(IV,IX)  )  ; 

end  case  MGTYP  ; 
end  case  OI-OP  ; 

$  Update  the  element  mode  of  BASE  to  indicate  that 
$  01  is  an  element  of  BASE. 

COnPTYP(BnODE(BASE) )  MODEDIS.  MODE(OI)  ; 

elseif  LOCINS(OI)  /=  OH  then 

$  If  01  has  been  inserted  into  a  base,  then 
$  equivalence  this  base  uith  BASE. 

E2UIV(BASE,  LOCINS(OI))  ; 


else 


$  Otherwise,  insert  01  into  BASE. 

LOCINS(OI)  :=  BASE  ; 

$  Update  the  element  mode  of  BASE  to  indicate  that 
$  01  is  an  element  of  BASE. 

COMPTYP(BMODE(BASE) )  MODEDIS.  MODECOI)  ; 


end  if  ; 
end  V  ; 
return  ; 


206 


end  proc  INSERTLOCS  ; 


proc  nl  nODEDIS.  HZ  ; 

$  This  procedure  evaluates  the  disjunction  of  tuo  mode 
*  descriptors.   It  differs  from  the  disjunction  routine  in 
$  the  SETL  typefinder,  in  that  it  handles  element-of -base 
$  descriptors . 

$  The  rules  for  disjunction  of  modes  are  as  follows  • 

$  A)  The  disjunction  of  different  gross  types  yields 
$  •  general '  . 

$  B)  The  disjunction  of  two  element-of -base  modes  yields  an 
$  element  mode,  and  has  the  side-effect  of  equivalencing 
$  the  two  bases . 

$  C)  The  disjunction  of  two  sets  yields  a  set  whose  elements 
$  are  the  disjunction  of  the  respective  element 
$  descriptors  of  the  two  sets. 

$  D)  The  disjunction  of  two  maps  yields  a  map  whose  domain 
$  and  range  modes  are  the  disjunction  of  the  domain  modes 
$  and  the  range  modes  of  the  two  maps . 

S  E)  The  disjunction  of  two  tuples  yields  a  tuple  whose 

$  component  modes  are  the  disjunction  of  the  corresponding 

$  component  modes  of  the  two  tuples. 
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$  This  recursive  routine  is  called  by  the  routine 

$  HERGE-INTO,  INSERTLOCS  and  E2UIV .   It  calls  the  routine 

*  EQUIV. 

*  The  macros  used  in  this  routine  include 

*  GROSSTYP(M)  -  gross  type  of  mode  M,  see  Cm9). 

*  IS-PRin(ri)  -  true  if  M  is  a  primitive  mode,  see  (Ml?). 
$  BASENAM(M)  -  base  name  of  mode  descritor  M,  see  (M24). 

*  COriPTYP(n)  -  element  mode  of  mode  descriptor  M ,  see  (1119) 

*  DOnTYP(II)  -  domain  mode  of  mode  descriptor  M ,  see  (M22). 

*  RANTYP(II)  -  range  mode  of  mode  descriptor  M ,  see  (M23). 
$  CTYPN(n,I)  -  I-th  component  mode  of  mode  descriptor  t1, 

*  -  see  (n20 ) . 

*  LENTYP(M)    -  length  of  mode  descrptor  M,  see  (M21). 

$  The  local  variables  defined  in  this  routine  are 


repr 

n^ ,   nz,   dism  :  cmode-base  ; 

B1,  B2  :  eSB-BASE  ; 
end  repr  ; 


$  mode  desciptors 
$  bases 


$  If  m  is  of  zero  type,  the  disjunction  is  n2 

if  GROSSTYP(MI)  =  TZ  then 
return  M2  ; 

$  If  M2  is  of  zero  type,  the  disjunction  is  Ml 

elseif  GR0SSTYP(M2)  =  TZ  then 


208 


return  M1  ; 

$  If  Ml  and  M2  are  of  different  gross  types,  their 
$  disjunction  is  a  'general'  type. 

elseif  GROSSTYP(MI)  /=  GR0SSTYP(M2)  then  return  TGEN  ; 

$  If  Ml  and  M2  are  of  the  same  primitive  modes,  then  their 
$  disjunction  is  Ml. 

elseif  IS-PRIM(GR0SSTYP(M1 ) )  then  return  Ml  ; 

elseif  GR0SSTYP(M1 )=TELMT  then 

$  If  Ml  and  M2  are  member  basing  modes,  equivalence 
$  their  bases  and  return  Ml. 

EBUIVCBASENAMCMI) ,  BASENAM(M2))  ; 
return  Ml  ; 

else 

$  Otherwise,  Ml  and  M2  are  composite  modes.   Construct  a 
$  template  for  the  disjunction  mode  descriptor. 

DISM  :=  [ GROSSTYPCMI ) )  ; 

$  Construct  the  element  mode  of  DISM  by  recursively 
$  invoking  this  routine  to  derive  the  disjunction  of 
$  the  element  modes  of  Ml  and  M2 . 

case  GROSSTYP(MI)  of 


(TSET,  THTUP) 
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COnPTYP(DISM)  :=  COHPTYPCm)  MODEDIS.  COMPTYPCMZ)  ; 


(TMAP)  : 

COMPTYPCDISM)  :=  [THTUP.  [11  ; 

DOnTYP(DISn)  :=  DOtlTYP(m)  nODEDIS.  D0nTYP(n2)  ; 

RANTYP(DISM)  :=  RANTYPCMI)  HODEDIS.  RANTYP(ri2)  ; 

(TMTUP)  • 

(ViX  :=  1 . . .LENTYP(MI) ) 

CTYPNCDISM,IX)  :  =CTYPN(I11  ,IX)  MODEDIS.  CTYPN(M2,IX) 
end  ViX; 
end  case; 

return  DISM  ; 


end  if  ; 


end  proc  MODEDIS.  ; 


proc  PARTITION(CRSET)  ; 

$  This  procedure  takes  an  attribute  set  and  partitions  it 
$  according  to  points-of -call  through  which  attributes 
$  were  propagated.   The  RC-string  which  accompanies  each 
$  member  of  the  set  is  scanned  backwards,  skipping  over 
$  completed  calls  until  a  call  without  a  return  is  found. 
$  These  call  instructions  serve  to  partition  the  attribute 
$  set.   The  procedure  returns  a  map  on  call  instructions. 
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$  This  routine  is  called  by  the  routines  MERGEOBJ, 
$  PROPOFnAP,  PROPOFTUP,  PROPSOFMAP,  PROPSOFTUP  and 
$  PROPSOFAMAP.   It  calls  the  routine  LASTCALi.. 

$  The  lot^al  variables  defined  in  this  routine  are 

repr 

CLASSES  :  mmap ( €RC-BASE ) set ( eoI-BASE )  ; 

$  map  RC-string  into  set  of 
$  occurrences 
CRSET  :  set( I €RC-BASE,€OI-BASE 1) ; 

$  pseudo  creation  points 
P  :  CRC-BASE  ;  $  RC-string 

01  :  coi-BASE  ;  $  variable  occurrence 

end  repr  ; 

CLASSES  : =  nl  ; 

(VIP,  Oil  €  CRSET)  CLASSES{LASTCALL(P) }  with  01  ;; 

return  CLASSES  ; 

end  proc  PARTITION  ; 


proc  LASTCALL(RC-STRING)  ; 

$  This  procedure  scans  RC-STRING  backwards  until  it 
$  encounters  a  call  not  matched  by  a  return.   If  RC-STRING 
$  is  a  NULL-PATH,  then  it  returns  NULL-PATH,  otherwise  it 
$  returns  the  RC-CALL  found.   However,  if  RC-STRING  is 
$  incomplete  or  no  such  call  is  found,  it  also  returns 
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$  NULL-PATH. 

$  This  routine  is  called  by  the  routine  PARTITION. 

$  The  local  variables  defined  in  this  routine  are 

repr 

RC-STRING  ••  €RC-BASE  ;        $  RC-string 

I  :  int  ;  *  length  of  RC-string 

CALLCOUNT  =  int  ;  $  count 

end  repr  ; 

$  If  RC-STRING  is  a  NULL-PATH,  returns  NULL-PATH, 
if  RC-STRING  =NULL-PATH  then  return  NULL-PATH  ; 

else 

I  =  #  RC-STRING  ; 
CALLCOUNT  =  0  ; 

$  Scan  RC-STRING  backuards . 

(uhile  CALLCOUNT<=0  and  I> 1  DOING  I  :=  1-1  ;) 
if  RC-STRING(I) (  1  )=RC-CALL  then 

$  If  the  i-th  component  is  a  call,  then 
$  increment  count  by  one. 

CALLCOUNT  —  CALLCOUNT  +1  ; 
else 

$  Otherwise,  it  is  a  return  and  therefore  count 
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$  is  decremented  by  one. 

CALLCOUNT  :=  CALLCOUNT  -  1  ; 
end  if  ; 
end  while  ; 


if  CALLCOUNT  <=  0  and  1=1  the 


n 


$  If  RC-STRING  is  incomplete  or  no  such  call  is 
$  found,  return  NULL-PATH. 

return  NULL-PATH  ; 


else 


$  Otherwise,  return  the  call  found 


return  RC-STRING(I)  ; 


end  if  CALLCOUNT  ; 
end  if  RC-STRING  ; 
end  proc  LASTCALL  ; 


proc  MOVELOCS  ; 

$  This  procedure  moves  a  'locate'  instruction  out  of  a  loop 
$  whenever  the  basing  pointer  which  it  generates  is  not 
$  actually  used  within  the  loop.   The  following  case  is 
$  typical  :  a  variable  X  is  known  to  be  '€B'  ; 
$  PS-CRTHISlX}  includes  the  following  occurrence  of  X  •• 


$ 


(VI  :=  1..100)  X  :=  X  +  Y  ;; 
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$  The  procedure  GENLOCS  will  have  provisionally  inserted  a 

$  locate  instruction  within  the  loop,  for  the  ovariable  X 

$  therein.   This  is  clearly  inappropriate,  because  no  such 

$  value  of  X,  (except  the  last  one  is  used  as  a  base 

$  element) .   The  proper  place  for  the  locate  instruction 

$  is  at  exit  from  the  loop  or  from  some  containing  loop. 

$  The  procedure  shown  below  systematizes  the  process  of 

$  'locate  motion'.   A  'locate'  instruction  can  be  moved 

$  out  of  an  interval  if  no  use  is  made  of  the  basing 

$  pointer  which  it  generates,  within  the  interval.   This 

$  can  be  ascertained  by  following  the  FFROM  map  of  the 

$  (provisionally)  located  variable.   If  we  reach  an 

$  operation  which  uses  the  basing  pointer  within  the 

$  interval  then  the  'locate'  cannot  be  moved.   If  the  use 

$  appears  in  some  successor  interval,  then  it  will  be 

$  advantageous  to  move  the  'locate'  operation  to  the  head 

$  of  that  interval. 


$  The  following  procedure  systemizes  the  process  of 

$  locate  instruction  motion.   We  scan  the  FFROM  chain  for 

$  each  occurrence  01  at  which  a  locate  instruction  has  been 

$  suggested  in  phase  II.   The  scanning  procedure  continues 

*  until  we  find  all  the  places  at  which  the  basing  pointer 

$  created  at  01  might  potentially  be  used.   The  intervals 

$  which  contain  these  points  are  called  the  target  intervals 

$  of  01,  and  a  map  MOVETO  summarising  this  information  is 

$  generated.   If  one  of  the  target  intervals  of  01  is  the 

$  interval  in  which  01  resides,  MOVETO{OI}  is  defined  as  nl. 
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$  We  use  MOVETO  to  insert  actual  locate  instructions  as 
S  follous.   If  n0VET0{0I}  is'  nl  then  a  locate  operation  is 
$  inserted  right  after  01  is  created.   Otherwise,  for  each 
$  interval  INT  in  MOVETO{OI},  a  locate  operation  is  inserted 
$  at  the  entry  to  the  largest  interval  which  includes  INT 
$  but  not  01. 

$  This  routine  is  called  by  the  main  routine  AUTO-'DATA. 

$  This  routine  calls  the  routines  INTMAX,  INS-AFTER  and 
$  INS-TARG. 

$  The  global  variables  referenced  by  this  routine  include 

*  LOCINS(OI)   -  the  base  into  which  01  is  inserted 

*  MODE(OI)     -  mode  of  occurrence  01 

$      BASE-'ELMTS   -  sets  of  occurrences  which  have  been 
$  -  known  to  be  elements  of  bases 

$  The  macros  used  in  this  routine  include 

*  IS-OVARCOI)  -  true  if  occurrence  01  is  an  ovariable, 
$  -  see  (Mil). 

$      IS-HASHED(OI)   -  true  if  01  is  subject  to  operations 
$  -  involving  hashing*  see  (nl3). 

*  OFROMKOI)   -  the  ovariable  in  the  same  instruction 

*  -  as  the  ivariable  01,  see  (MIS). 

$  The  local  variables  defined  in  this  routine  are 

repr 

WORK  :  set(€OI-BASE)  ;      $  workpile  of  occurrences 
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01,  WOI,  U,  NU,  NEUOVAR,  NEWIVAR  :  eoI^BASE  ; 

$  variable  occurrences 
USES  :  set( I GRC-BASE, eOI-BASE  ]  )  ; 

$  uorkpile 
P,  NP,  NNP  :  €RC-BASE  ;     $  RC-strings 
I,  NEMI  :  int  ;  $  instruction  identifiers 

MOVETO  J  inmap(€OI-'BASE)set(  I  CRC-BASE,  eoi-BASE  1  )  ; 

$  map  occurrences  to  target 

$  intervals 
BASE  :  €SB-BASE  ;  $  base 

end  repr  ; 

$  Initialize  base-'elmts . 
BASE-ELHTS  :=  nl  ; 

$  For  all  occurrences  to  be  inserted  into  bases 

(VlOI,  BASEl  €  LOGINS  I  not  CAN-DROP ( BASE )  ) 
WORK  ••  =  {01}  ;        $  uorkpile 
(while  WORK/=nl) 

WOI  from  WORK  ; 
USES  :=  FFROtltWOI}   ; 

$  Me  follow  FFROM  until  we  pass  out  of  the 

$  interval,  or  until  we  find  a  hashing  use  of  the 

$  variable. 

(while  USES  /=nl) 

IP,  UJ  from  USES  ; 
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$  If  the  interval  containing  U  is  different  from 
$  the  interval  containing  WOI,  then  prepare  to 
*  move  locate  instruction  from  WOI  toward  U. 

if  OI-INTOV(U)/=0I-INT0V( WOI)  then 
nOVETO{WOI}  with  [P,  Ul  ; 

$  If  U  is  subject  to  an  instruction  involving 
$  hashing  operation,  locate  instruction  can 
$  not  be  moved . 

elseif  IS-HASHEDCU)  then 
MOVETO{WOI}  :=  nl  ; 
continue  V  ; 

$  If  the  occurrence  01  ue  are  tracing  is 
$  assigned  to  another,  ue  must  also  trace  the 
$  target  of  the  assignment,  for  possible  use 
$  of  the  basing  pointer  thus  transmitted. 

elseif  OI-OP(U)  in  OPS-ASN  then 

NEWUSES  :=  FFROM{OFROMI(U) }  +  FFROM{U}  ; 

else  $  Continue  chaining. 

NEWUSES  :=  FFROM{U}  ; 
end  if  ; 

$  The  elements  in  NEWUSES,  uhich  are  actually 
$  chained  to  the  original  variable  must  be 
$  processed. 
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USES  +  {  INNP,  NU J  :  [ NP ,  NU )  e  NEUUSES  I 
(NNP  :=  P  CC.  NP)/=  ERROR-PATH  }  ; 
end  while  USES; 
end  while  WORK  ; 
end  V[OI  ; 

$  Nou  perform  code  insertion.   First  process  the  'locate' 
$  instructions  which  were  not  moved.   The  insertion  to  be 
$  performed  is  indicated  by  an  assignment  statement,  from 
$  an  occurrence  having  primitive  mode,  to  an  occurrence 
$  having  member  basing  mode. 

(VtOI,  BASEl  €  LOGINS  1  MOVETO {01} =nl  and  not  CAN-DROP ( BASE )  ) 
I  :=  INSTNO(OI)  ; 

$  Physically  insert  a  locate  insruction  after 
$  instruction  I. 

NEWARGS  :=  [ OI-HAME ( 01 ) ,  OI-NAHECOI)]  ; 
NEWI  :=  INS-AFTER(I,  Sl-ASN,  NEWARGS)  ; 
NEWOVAR  ■•=  IHEWI,  1  1  ; 
HENIVAR  :=  [NEWI, 2  J  ; 

$  Assign  proper  modes  to  the  variable  occurrences  in 
$  this  new  instruction. 

MODE(NEWOVAR)  :=  ITELMT,  NULL-PATH,  BASEl  ; 
MODE(NEWIVAR)  :=  MODE(OI)  ; 

$  Update  the  set  BASE-ELMTS. 
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BASE-ELMTS  with  I NEWOVAR , B ASE  ]  ; 

$  Update  BFROM  and  FFROM. 

FFROrUNEWOVAR}  :=  FFROn{OI}  ; 

FFROn{OI}  :=  {  INULL-PATH.NEWIVAR  1  }  ; 

BFROM{NEWIVAR}  ==  {  [ NULL-PATH , 01  1  }  ; 
end  V  ; 

$  Nou  process  the  occurrences  in  MOVETO.   The  optimal  point 

$  for  inserting  the  neu  instruction  is  the  head  of  the 

$  largest  interval  which  contains  the  target  occurrence, 

$  and  which  does  not  contain  the  original  occurrence 

$  (whose  locate  has  been  moved).   The  problem  of  finding 

$  such  an  interval  also  arises  in  relation  with  copy 

$  optimization.   Here  we  use  several  utility  procedures 

$  taken  from  that  module  : 

$  INTMAX  :  finds  the  largest  interval  in  a  geven  sequence  of 
$  derived  intervals  which  does  not  contain  a  given 
$  variable  occurrence. 

$  INS-TARG  :  inserts  a  new  instruction  in  the  target  block 
$  of  the  chosen  interval,  and  returns  nl  if  such  an 
$  instruction  is  already  in  the  target  block. 

(VVARSET  :=  nOVETO{OI}  ) 
BASE  :=  LOCINS(OI)  ; 

(VI P,  V  )  €  VARSET) 
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$  Calculate  intervals  of  the  derived  sequence  which 
$  contain  V. 

INTSE2  —  lID  =  ID: =OI-INTOV( V)  while  ID   /=   OH 
doing  ID : =INTOV ( ID )  1  ; 

$  Find  the  target  block  of  the  last  interval  of 
$  INTSEQ. 

TARG  :=  INTMAXCOI,  P.  ftlNTSEQ+l.  INTSEB)  J 

$  Insert  a  locate  instruction  into  this  target 
$  block. 

I:=INS--TARG(INTSE2(TARG)  ,  2  1  -  ASN  ,  [  OI-NAME  (  V  )  ,  OI-NAHE  (  V  )  1); 

$  Once  the  instruction  is  successfully  inserted,  set 
$  proper  mode  of  new  occurrecnes  to  indicate  the 
$  locate  operation  to  to  performed. 

if  I  /=  on  then 

NEWOVAR  :=  11,11  ;        *  ovariable 

NEWIVAR  :=  ll,2]  ;        *  ivariable 

MODE(NEWOVAR)  ==  I TELMT ,  OM ,  BASE)  ; 

riODE(NEWIVAR)  :=  MODECOI)  ; 

$  Indicate  that  the  new  ovariable  has  element 
$  basing. 

BASE-ELMTS  with  I NEWOVAR , BASE  1  ; 
end  if  ; 
end  V[P  ; 
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end  WARSET  ; 


return  ; 


end  proc  MOVELOCS  ; 


proc  UPDnODES  ; 

$  This  constitutes  the  fourth  phase  of  the  data  structure 
$  choice  algorithm.   We  adjust  the  modes  of  occurrences  in 
$  three  steps  • 

$  A)  For  composite  objects  and  member  based  objects,  ue 
$  adjust  their  modes  using  the  procdure  SUBSTMD,  to 
$  replace  references  to  dropped  bases  by  the  mode  of  their 
$  elements  and  also  replace  references  to  non-real  bases 
$  by  references  to  real  bases. 

$  B)  At  each  occurrence  ue  determine  uhether  member  basings 

$  and  type  information  (possibly  involving  domain  basings) 

$  are  useful  by  examining  the  subsequent  uses  of  the  value 

$  appearing  at  this  occurrence.   Two  indicators 

$  NON-HASH-USE  and  HASH-USE  are  generated  for  this 

$  purpose . 

$  C)  We  then  propagate  member  basing  pointers  from  locate 

$  and  value  retrieval  instructions  to  other  occurrences 

$  which  need  basings.   The  propagation  procedure  ensures 

$  that  proper  basings  are  carried  with  the  variable 

S  values . 
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$  This  routine  is  called  by  the  main  routine  AUTO-DATA.   It 
$  calls  the  routines  MODECMPRS.  USE-DETERn  and 
$  BASING-PROP. 

MODECOnPRSC)  ; 
USE-DETERNC )  ; 
BASIHG-PROPO  ; 

return  ; 

end  proc  UPDMODES  ; 


proc  riODECOMPRS  ; 

$  This  procedure  adjusts  the  basing  mode  of  variable 
$  occurrences,  using  the  procedure  SUBSTMD.  Basings 
$  referencing  non-real  bases  are  adjusted  to  reference 
$  real  bases.  Basings  referencing  bases  dropped  are 
$  replaced  by  references  to  the  element  modes  of  the 
$  bases. 

$  The  purpose  of  introducing  dummy  bases  for  tuples  and  for 

$  the  range  of  maps  uas  to  use  these  bases  as  markers  for 

$  possibly  complex  structures  which  may  themselves  be 

$  based.   These  markers  are  replaced  by  the  corresponding 

$  structures  by  means  of  procedure  SUBSTMD  described 

$  above.   A  frequent  and  important  case  uhere  this 

$  mechanism  is  useful  is  that  of  multivariate  maps.   The 

$  operation  = 
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$      F(X,  Y)  :=  Z  ; 

$  expands  into  the  following  sequence  of  univariate 

S  retrievals  and  storages, 

$  T  :=  f{X}  ; 
$  T( Y)  : =  Z  ; 
$       F{X}  : =  T  ; 

$  These  generate  the  following  basings  '• 

$  F     :     inap{€Bl}€B2 

$  T     ••     inap(€B3)€B4 

$  Our  system  will  also  produce  locate  instructions  ' 

$  B2  with  T  ; 
$  B3  with  Y  ; 
$      64  with  Z  ; 

$  If  F  is  only  used  as  a  bivariate  map,  then  B2  will 

$  eventually  be  recognized  to  be  useless,  and  after 
$  determining  that  the  mode  of  B2  is  ' map ( €B3 ) €B4 '  ,  the 
$  final  mode  for  F  will  be  : 


$ 


F  :  map{eBl )map(€B3)eB4  ; 


$  which  is  the  desired  descriptor. 

$  In  general  there  will  be  fewer  modes  to  adjust  than 

$  variable  occurences.   It  is  therefore  economical  to  map 

$  modes  themselves  into  their  final  forms,  and  then  to  use 
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$  this  map  to  update  the  modes  of  variables. 

$  This  routine  is  called  by  the  routine  UPDMODES  and  calls 
$  the  routine  SUBSTHD. 

$  The  global  variables  referenced  by  this  routine  include 
$      ALL-OI       -  all  variable  occurrences 
$      IIODE(OI)     -  mode  of  occurrence  01 

$  The  macros  used  in  this  routine  include 

$      MGTYPCOI)    -  gross  type  of  occurrence  01,  see  (1126). 

$      IS-PRIIKM)   -  true  if  M  is  a  primitive  mode,  see  (M17). 

$  The  local  variables  defined  in  this  routine  are 

repr 

NEWMODE  :  smap ( CHDOE-B ASE ) €MODE-B ASE  ; 

$  temporary  map  from  mode 
$  descriptor  to  mode  descriptor 
01  :  eoi^BASE  ;  $  variable  occurrence 

n    ••     enODE-BflSE  ;  $  mode  descriptor 

end  repr  ; 

$  Initialise  NEWnODE. 

NEWnODE  : =  NL  ; 

$  Update  non-primitive  mode  descriptors. 

(Vn  €  RANGE  MODE  I  not  IS-PRin ( GROSSTYP ( H ) ) ) 
NEWnODE(n)   :=  SUBSTMD(M)   ; 
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end  V  ; 

$  Update  the  non-primitive  modes  of  variable  occurrences 

(VOI  €  ALL-OI  I  not  IS-PRIM(MGTYP ( 01)  ) 

nODE(OI)  :=  NEWnODECNODECOI)  )  ; 
end  V  ; 
return  ; 
end  proc  tlODECOMPRS  ; 


proc  SUBSTriDCM)  ; 

4  This  procedure  updates  mode  descriptors,  by  replacing 
$  references  to  dropped  bases  by  the  mode  descriptors  for 
$  the  corresponding  base  elements.   It  also  replaces  base 
$  names  by  the  names  of  their  representatives,  so  that  the 
$  updated  mode  contains  only  references  to  real  bases  . 

$  As  a  side-effect,  the  modes  of  real  bases  are  also  updated 

$  when  they  are  referenced  for  the  first  time.   For 

$  example,  when  ue  update  a  mode  '€B1'  where  the  real  base 

$  of  B1  is  B2  whose  mode  was  '  base ( [ eB3 , bool  1  ) '  and  where 

$  B3,  whose  mode  was  'base(int)',  is  dropped,  this  routine 

*  updates  the  mode  of  B2  to  be  ' base ( I int , bool 1 ) '  and  then 

$  replaces  the  mode  '€B1'  by  '€B2'.   This  base  mode 

$  updating  procedure  makes  subsequent  mode  updating 

$  procedures  more  efficient,  especially  when  they  reference 

$  to  the  bases  which  have  been  referenced  before. 
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$  This  recursive  routine  is  called  by  the  routine  MODECMPRS 

$  The  global  variables  referenced  by  this  routine  include 
$      REALB(B)     -  representative  of  the  equivalence  class 
*      CAH-DROPCB)  -  true  if  base  B  can  be  dropped 


$  The  macros  used  in  this  routine  include 


$ 
* 

$ 
$ 
$ 
$ 


GROSSTYP(M)  -  gross  type  of  mode  t1,  see  (M19). 
IS-PRIIKM)   -  true  if  M  is  a  primitive  mode,  see  (M17). 

-  base  name  of  mode  descritor  M ,  see  (M2M). 

-  element  mode  of  mode  descriptor  M,  see  (M19) 

-  domain  mode  of  mode  descriptor  n,    see  (1122). 

-  I-th  component  mode  of  mode  descriptor  t1, 

-  see  (M20) . 

-  range  mode  of  mode  descriptor  n,    see  (n23). 

-  length  of  mode  descrptor  M»  see  (t121). 


BASENAM(M) 
COMPTYPCM) 
DOMTYP(n) 
CTYPN(n,I) 

RANTYP(n) 
LENTYPCM) 


$  The  local  variables  defined  in  this  routine  are 


repr 

B  :  CSB-BASE  ; 

n    ••     enODE-BASE  ; 

IX  :  int  ; 
end  repr  ; 


$  base 

$  mode  descriptor 

$  index 


*  If  M  is  a  primitive  mode,  then  return  M 


if  IS^PRIM(GROSSTYP(M) )  then  return  M  ; 


else  case  GROSSTYP(n)  of 
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$  For  a  member  basing  mode  t1 ,  update  the  mode  of  the 
$  real  base  B  of  M .   If  B  can  be  dropped  then 
$  re-assign  M  the  mode  of  the  elements  of  B,  otherwise 
*  complete  the  BASENAM  component  of  t1 . 

(TELMT)  : 

B  •- =  REALB(BRSENAN(M)  )  ; 

COnPTYP(BnODE(B)  )  ••  =  SUBSTI1D(COnPTYP(BnODE(B)  )  ; 

if  CAN-DROP(B)  then 

M  :=  COMPTYPCBMODECB) )  ; 
else 

BASEHAMCn)  :=  B  ; 
end  if  ; 

$  For  other  composite  modes,  update  the  element  mode  of 
$  n  by  recursively  invoking  this  routine. 

(TBASE,  TSET,  THTUP)  :  COnPTYP(n)  ==  SUBSTMD ( COHPTYP ( n ) )  ; 

(TNAP)  :  DOnTYP(n)  :=  SUBSTMD ( DOMTYP ( n ) )  ; 
RANTYP(M)  :=  SUBSTtID  (  RANTYP  (  M)  )  ; 

(TUTUP)   : 

( VIX  : =1 . . .LENTYP(M) ) 

CTYPNCM,  IX)  :=  SUBSTHD ( CT YPN ( M ,  IX))  ; 
end  V  ; 


end  case 
return  M  ; 

end  if  ; 
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end  proc  SUBSTMD  ; 

proc  USE-DETERM  ; 

$  This  procedure  determines  whether  member  basing  and/or 
S  domain  basing  should  be  carried  at  variable  occurrences. 
$  If  the  value  flow  shows  that  the  value  of  an  occurrence 
$  will  subsequently  be  subject  to  membership  test 
$  (explicitly  or  implicitly),  member  basing  pointers 
$  should  be  kept  ;  this  is  flagged  by  setting  HASH-USE 
$  equal  to  true.   Similarly,  if  the  value  flow  shows  that 
$  the  type  information  or  domain  basing  is  useful  in  the 
$  subsequent  uses,  type  information  (with  domain  basing) 
$  should  be  kept  ;  this  is  flagged  by  setting  NON-HASH-USE 
$  true.   Note  that  both  types  of  information  might  be 
$  carried  along  with  certain  occurrences. 

$  This  routine  is  called  by  the  routine  UPDMODES. 

$  The  global  variables  referenced  by  this  routine  include 

$  ALL-IVAR     -  all  ivariables 

*  BFROnCOI}    -  occurrences  to  which  01  is  directly  linked 

$  HASH-USE(OI)  -  true  if  member  basing  should  be  kept  for 

$  -  occurrence  01 

$  NON-'HASH-'USE(OI)  -  true  if  domain  basing  should  be  kept  for 

$  -  occurrence  01 


$  The  macros  used  in  this  routine  include 

$      !1GTYP(0I)    -  gross  type  of  occurrence  01,  see  (t126) 
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$  IS-OVAR(OI)  -  true  if  occurrence  01  is  an  ovariable, 

$  -  see  (nil). 

*  OI-OP(OI)    -  opcode  of  the  instruction  which  contains 

$  —  occurrence  01,  see  (M7). 

$  IFROno(OV,I)  -  I-th  ivariable  of  the  instruction  which 

$  -  contains  ovariable  OV,  see  (M15). 

$  The  local  variables  defined  in  this  routine  are 

repr 

WORK  :  setCeoi-BASE)         *  uorkpile  of  occurrences 

IV,  01  :  €OI-BASE  ;         $  variable  occurrences 

POI  :  I  eRC-'BASE,€OI-BASE  ]  ; 

$  pair  of  RC-string  and  occurrenc 

P  :  GRC-BASE  ;  $  RC-string 

end  repr  ; 

CONST  U-nEMB.U-TYPE  ;  end  ; 

$  usage  information  to  be  propagated 

$  Initialize  WORK,  HASH-USE  and  NON-HASH-USE . 

WORK  : =  NL  ; 
HASH-USE  : =  NL  ; 
NON-HASH-USE  = =  NL  ; 

(  V  IV  €  ALL-IVAR  ) 

$  The  ivariables  which  need  member  basing  pointers 

$  have  been  assigned  TELMT  mode. 
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if  MGTYP(IV)  =  TELMT  then 
HASH-USE(IV)  :=  TRUE  ; 

$  Prepare  to  propagate  information  backwards  through 
$  the  BFROn  chain. 

WORK  +  {  lPOI,U-riEriB  1  :  POI  e  B  FROM  {  I V  }  }  ; 

$  Otherwise,  IV  must  be  subject  to  an  operation 

$  involving  global  structure  iteration,  unless  this  is 

$  an  assignment  instruction.   In  such  a  case,  type 

$  information  Cpossibly  involving  domain  basing)  is 

$  generally  useful. 

elseif  OI-OP  notin  OPS-ASN  then 
NON-HASH-USECIV)  ==  TRUE  ; 

$  Prepare  to  propagate  information  backwards  through 
$  the  BFROM  chain. 

WORK  +  {  [POI,U-TYPEl  :  POie BFROM { IV }  }  ; 
end  if  ; 

end  V  ;  v 

$  Usage  infromation  is  then  propagated  backward  through 
$  BFROM  chain  and  assignment  instructions. 

(  while  WORK  /=  NL  ) 

l[P,OIl,USEl  from  WORK  ; 
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$  Determine  usage  information  for  the  ovariables  and 
$  the  ivariables  of  assignment  instructions. 

if  IS-OVAR(OI)  or  OI-OP(OI)  in  OPS-ASN  then 

case  USE  of 

(U-MEHB)  : 

$  If  HASH-USE  flag  of  01  is  already  on  then  01 
$  need  not  be  processed,  otherwise  turn  on  the 
$  flag. 

if  HASH-USECOI)   then 

continue  uhile  ; 
else 

HASH-USE(OI)  :=  TRUE  ; 
end  if  ; 

(U-TYPE)  : 

$  If  NON-HASH-USE  flag  of  01  is  already  on  then 
S  01  need  not  be  processed,  otherwise  turn  on 
$  the  flag. 

if  NON-HASH-USECOI)   then 

continue  uhile  ; 
else 

NON-HASH-USECOI)  :=  TRUE  ; 
end  if  ; 

end  case  ; 
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$  Propagate  through  assignment  instructions. 

if  IS-OVARCOI)  and  OI-OP(OI)  in  OPS-ASN  then 

WORK  uith  (  [ P,IFROnO(OI,  1  )  ]  ,USE  )  ; 
end  if  IS-OVAR  ; 

end  if  IS-OVAR  ; 

*  Propagate  backward  through  BFROM  chain 

WORK  +  {  HP, 01], USE]  :  [  P  ,  01  ]  €  BFROM  {  01 1  ; 

I  if  USE  =  U-'riEMB  then  not  HASH-USE(OI) 

elseif  USE  =  U-TYPE  then  not  NON-HASH-USE ( 01)  }  ; 

end  while  ; 
return  ; 

end  proc  USE-DETERM  ; 


proc  BASING-PROP  ; 

$  This  procedure  propagates  member  basing  pointers  from 
$  inserted  locate  instructions  and  value  retrieval 
$  instructions  to  other  variable  occurrences,  wherever 
$  necessary.   In  order  to  allow  multiple  representations 
$  for  a  variable  occurrence,  the  map  MODE  is  used  as  a 
$  multi-valued  map. 

$  This  routine  is  called  by  the  routine  UPDMODES. 

$  The  GLOBAL  variables  referenced  by  this  routine  include 
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$       BLOCKS 
$      nODEtOIl 
*       ARG(I) 


-  set  of  code  blocks 

-  modes  of  occurrence  01 

-  arguments  of  instruction  I 


$  The  macros  used  in  this  routine  include 

$      FORALLCODECB, I)  -  for  each  instruction  I  in  block  B, 

$  -  see  (n4). 

$      MGTYPCOI)    -  gross  type  of  occurrence  01,  see  (M26). 

$      COriPTYP(M)   -  element  mode  of  mode  descriptor  11,  see  (M19) 

$      DOnXYPdl)    -  domain  mode  of  mode  descriptor  M,  see  (tl22). 

$      RANTYP(M)    -  range  mode  of  mode  descriptor  M,  see  (n23). 

$      CTYPNCM.I)   -  I-th  component  mode  of  mode  descriptor  f1 , 

-  see  (n20) . 
$      OI-VALUECOI)  -  value  of  occurrence  01,  see  (M9). 

$  The  local  variables  defined  in  this  routine  are 

repr 

I  :  int  ;  $  instruction  identifier 

OV,  IV1,  IV2  :  €0I-BASE  ;   $  variable  occurrences 
DONE  :  set(  I  eoi-BASE,  etlODE-BASE  1  )  ; 

$  occurrences  which  have  been 

$    processed 
WORK     :     setC  I  €OI-BASE,  etlODE-BASE  ]  )     ; 

$  uorkpile  of  occurrences  with 

S  member  basings 
M,  MD  :  CMODE-BASE  ;         $  mode  descriptors 
end  repr  ; 
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WORK  :=  NL  ; 


$  Initialize  uorkpile 


$  For  each  instruction 

(  V  BeBLOCKS,FORALLCODE(B,I)  ) 

[0V,IV1,IV2]  :=  ARG(I)  ;    $  Unpack  arguments. 

*  If  this  instrution  has  no  ovariable  then  continue 
$  processing  next  instruction. 

if  OV  =  on  then  continue  V  ; 

$  At  this  point,  the  ©variables  of  inserted  locate 
$  instructions  will  have  been  assigned  member  basing 
$  modes,  and  these  are  the  only  ovariable  occurrences 
$  uhich  have  been  assigned  such  modes  (by  MODE  map). 
$  The  mode  map  of  all  other  ovariable  occurrences  will 
i    contain  domain  basings. 

M  :=  OM  ; 

if  MGTYP(OV)=TELMT  then 

$  Insert  [ OV , MODE( 01 )  1  into  uorkpile  to  indicate 
$  that  OV  has  member  basing  pointer. 

WORK  with  IOV,MODE(OV)  1  ; 

$  If  OV  also  needs  domain  basing,  then  OV  is 

$  assigned  multiple  basings  by  absorbing  the  mode 

*  of  its  ivariable. 
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if  NON-HASH-USE(OV)  then 

MODE{OV)  +  MODEdVI}  ; 
end  if  ; 

else 

$  Member  basing  pointers  can  also  be  transmitted  by 
$  value  retrieval  operations. 

case  OPCODE(I)  of 

(ei^ARB,21^FR0ri,el-'NEy.T,ei-INEXT)  : 

if  MGTYP(IVI)  =  TSET  and  ELMBASECIVI)  /=  OM  then 

$  For  a  value  retrieval  from  a  based  set  : 

n    •=    COMPTYPCMODEdVI  )  )  ; 
end  if  ; 

(QI-NEXTD.QI-INEXTD)  : 

if  MGTYPCIvn  =  THAP  and  DOMBASE(IVl)  /=  OM  then 

$  For  an  iteration  over  a  based  map  : 

M  :=  DOMTYPCMODECIVI  )  )  ; 
end  if  ; 

(ei-OF)  : 

case  MGTYPCIV 1 )  of 

(TMAP)  : 
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if  DOMBASE(IVI)  /=  OM  then 

$  For  a  value  retrieval  from  a  based  map 
$  : 

n  :=  RANTYP(nODE(IV1 ) )  ; 
end  if  ; 


(THTUP) 


if  ELMBASEdvn  /=  OM  then 

$  For  a  value  retrieval  from  a  based 
$  tuple  of  knoun  length  = 

M  :=  C0MPTYP(M0DE(IV1 ) )  ; 
end  if  ; 

(iriTUP)  : 

if  ELMBASECIVI)  /=  OM  then 

$  For  a  value  retrieval  from  a  based 
$  tuple  of  unknown  length  • 

M  :=  CTYPNCMODEdVl  )  ,0I-VALUE(IV2)  )  ; 
end  if  ; 

end  case  MGTYP  ; 


end  case  OPCODE  ; 

$  If  M  is  not  undefined  then  insert  (OV.M)  into 
$  uorkpile  to  indicate  that  member  basing  M  is 
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$  available  at  OV. 

if  M  /=  OM  then  WORK  uith  lOV,ri]  ;  end  if  ; 
end  if  MGTYP  ; 
end  V  ; 

$  Propagate  member  basings  through  FFROM  chain  and 
$  assignment  instructions. 

DONE  •■=  NL  ; 

$  Work  contains  the  occurrences  uith  member  basings. 

(  uhile  WORK  /=  NL  ) 

I  OI,ri  I  from  WORK   ; 

$  Assign  proper  mode  to  01  according  to  HASH-USE(OI) 
$  and  NON-HASH-USECOI) . 

if  HASH-USE(OI)  then 

if  NON-HASH-USECOI)  then 

$  If  both  HASH-USE(OI)  and  NON-HASH-USEC 01 )  are 
$  true,  then  assign  01  member  basing  in 
$  addition  to  domain  basing. 

M0DE{0I}  uith  M  ; 

else 


237 


$  If  only  HASn-USE(OI)  is  true,  then  assign  OV 
$  member  basing  only. 

nODE{OI}  :=   {n}  ; 

end  if  KON-HASH-USE  ; 

$  Indicate  01  has  been  processed  and  insert  the 
$  occurrences  which  are  linked  to  01  and  have  not 
$  been  processed  into  the  uorkpile  . 

DONE  uith  IOI,M  1  ; 

MORK  +  {  [ WOI,n ] , [-, WOI  leFFROM{OI}  I  [WOI,Ml  notin  DONE  } 

S  If  01  is  the  ivariable  of  an  assignment 

$  instruction,  then  insert  the  ©variable  into  work 

$  if  it  has  not  been  processed. 

if  OI-OPCOI)  in  OPS-ASN  and  IS-IVAR(OI) 

and  I(OV  :=  OFROMI ( 01 ) ) , H ]  notin  DONE  then 
WORK  with  [OV,n]  ; 
end  if  OI-OP  ; 
end  if  ;HASH-USE  ; 

$  Otherwise,  NON-HASH-USE ( 01 )  must  be  true  and  01 

$  already  has  domain  basing.   Nothing  has  to  be  done. 

end  while  WORK  ; 

return  ; 

end  BASING-PROP  ; 
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proc  REFINE  ; 

$  This  procedure  applies  the  heuristics  explained  in  an 
$  earlier  chapter  to  choose  between  local,  remote  and 
$  sparse  representation  for  based  objects.   These 
$  heuristics  amount  to  the  follouing  '• 

$  A)  A  based  object  should  be  sparse  if  it  is  to  be  iterated 
$  over,  unless  ue  can  show  that  the  object  is  actually 
$  identical  with  its  base.   The  routine  ID-BASE  will  spot 
$  such  identities  in  a  few  potentially  useful  cases. 

$  B)  If  no  iteration  is  performed  on  an  object,  but  it  is 
$  subject  to  algebraic  operations  (union,  intersection, 
$  etc)  or  is  passed  as  a  parameter,  assigned  and  used 
$  destructively,  or  inserted  into  a  larger  object,  then  it 
$  should  be  remote. 

$  C)  If  only  differential  updating  operations  are  applied  to 

$  an  object,  and  it  is  never  transmitted  to  another  by 

*  assignment,  insertion  or  call,  then  it  can  have  a  local 

$  representation. 

$  This  routine  is  called  by  the  main  routine  AUTO-DATA.   It 
$  calls  the  routines  ID-BASE  and  nAKE-REHOTE. 

$  The  global  variables  referenced  by  this  routine  include 
$      LIVEPDS      -  the  set  of  live  periods 
$      MODElOI}     -  modes  of  occurrence  01 


$  The  macros  used  in  this  routine  include 
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$ 


GROSSTYP(ri) 
REPRATT(n) 

Ol-OPtOI) 


-  gross  type  of  mode  M,  see  ( M 1 9 ) . 

-  representation  attribute  of  domain  basing  M, 

-  see  (M25) . 

-  opcode  of  the  instruction  uhich  contains 

-  occurrence  01,  see  (M?). 


$  The  local  variables  defined  in  this  routine  are 


repr 

LPD  :  GLPD-BASE  ; 
ATTRIB  :  ATOM  ; 
01  :  eoI-'BASE  ; 
MD  :  €MODE-BASE  ; 
end  repr  ; 


$  live  period 

$  representation  attribute 

$  variable  occurrence 

$  mode  descriptors 


$  Find  the  occurrences  uhich  are  identical  in  value  with 
$  their  bases . 

ID-BASEC ) ; 

S  For  each  live  period  of  composite  objects  uhich  are  domain 
$  based 

(VLPD  €LIVEPDS  I  (3MDenODE{arb  LPD}  I  GROSSTYPCHD)  in  {THAP.TSET} 

$  tlD  is  the  domain  basing  of  a  set  or  map. 


if  MD  =  CM  then  continue  ;  ; 

$  If  tliere  is  an  occurrence  in  the  live  period  subject 
$  to  iteration  operations  and  not  identical  uith  its 
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$  base,  then  all  the  occurrences  in  the  live  period 
$  are  assigned  sparse  representations. 

if  (3  01  €  LPD  I  (OI-OPCOI)  in  {ei-NEXT,  Ql-NEXTDl 
and  not  ID-TO-BASEC 01 ) ) )  then 
ATTRIB  :=  SPARSE  ; 

$  If  any    occurrence  in  the  live  period  should  have 
$  remote  representation,  all  the  occurrences  in  this 
$  live  period  are  assigned  remote  representations. 

elseif  301  e  LPD  I  MAKE-REMOTE ( 01 )  then 
ATTRIB  :=  REMOTE  ; 

else 

ATTRIB  :=  LOCAL  ; 

end  if  ; 

*  Assign  the  calculated  attribute  to  the  mode  of  each 
$  occurrence  in  the  live  period. 

(VOI  e  LPD  I  (3  MD€MODE{OI}  I  GROSSTYP(MD)  in  {TMAP,TSET})  ) 
REPRATT(MD)  :=  ATTRIB  ;  ; 

end  Vol  ; 
end  VLPD  ; 

end  proc  REFINE  ; 


proc  ID-BASE()  ; 
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$  This  routine  identifies  a  potentially  important  situation 
$  in  which  the  value  of  a  variable  occurrence  is  identical 
$  (in  value)  with  its  base.   If  all  the  elements  X  of  a 
S  base  B  are  inserted  into  B  by  tlie  operation  of  a  set 
$  former  instruction  uhich  generates  a  set  S,  then  S  and  B 
$  will  be  identical  collections.   This  ID-TO-BASE  property 
*  can  be  propagated  from  S  to  other  occurrences  uhich  are 
S  linked  and  only  linked  to  S. 

S  This  routine  is  called  by  the  routine  REFIME  and  calls  the 
$  routine  SETOF. 

$  The  global  variables  referenced  by  this  routine  include 

$  BASE-ELMTS {b}   -  sets  of  occurrences  uhich  have  been 

S  -  knoun  to  be  elements  of  base  B 

$  ID-TO-BASECOI)  -  true  if  occurrence  01  is 

$  -  identical  in  value  with  its  base 

$  BFROM{OI}    -  occurrences  to  uhich  01  is  directly 

$  -  linked 

$  FFROM{OI}    -  occurrences  uhich  are  directly 

$  -  linked  to  01 


$  The  macros  used  in  this  routine  include 

$  OFROMKOI)   -  the  ovariable  of  the  instruction 

4  -  containing  the  ivariable  01 

$  OI-OPCOI)    -  opcode  of  the  instruction  containing  01; 

$  -  see  (M7 ) . 

$  The  local  variables  defined  in  this  routine  are 
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repr 

01,  WOI,  SOI  :  eOI-BASE  ; 

WORK  :  set(€OI-BASE)  ; 
end  repr  ; 


$  variable  occurrences 

$  uorkpile  of  occurrences 


$  For  each  base  whose  elements  are  inserted  at  only  one 
*  place 

(V  BASE  €  DOMAIN  BASE-ELMTS  I  #BASE-ELMT { B ASE }  =  1  ) 

$  Find  the  variable  occurrence  inserted  into  the  base. 

WOI  :=  arb  BASE-ELMTS { BASE}  ; 

$  If  WOI  is  not  an  argument  of  a  set  former,  bypass  the 

$  base.   The  value  SOI  returned  from  the  routine  SETOF 

$  is  the  set  being  constructed  if  WOI  is  an  argument 

*  of  a  set  former,  otherwise  undefined. 

if  (SOI  :=  SETOF(WOI))  =  OM  then  continue  V  ; 

$  Otherwise,  the  ovariable  of  the  set  former  instruction 
$  is  identical  in  value  to  its  base. 

ID-TO-BASECSOI)  :=  TRUE  ; 

$  Propagate  the  property  ID-TO-BASE  of  SOI  to  the 

$  occurrences  which  are  linked  to  SOI  and  only  to  SOI. 

WORK  :=  {01,  [ -,0I  l€FFROM{SOI)   I  #BFR0M{0I}  =  1  }  ; 


(  while  WORK  /=  NL  ) 
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01  from  WORK  ; 
ID-TO-BASE(OI)  :=  TRUE  ; 

$  Propagate  through  assignment  instructions. 

if  OI-OP(Oi:  in  OPS-ASN  and  IS-IVARCOI) 

and  not  ID-TO-BflSE (OV = =OFROMI ( 01 ) )  then 
WORK  uith  OV  ; ; 

$  Propagate  through  FFROM  chain. 

WORK  :=  {WOI,   I -, WOI  l€FFROM{OI}   I  # BFROM { WOI }  =  1 

and  not  ID-TO-BASE ( 01 )  }  ; 
end  while  ; 
end  V  BASE  ; 

return  ; 

end  proc  ID-BASE  ; 


proc  SETOF(OI)  ; 

$  This  routine  returns  OM  unless  01  is  an  argument  of  a  set 

$  former  instruction.  In  this  case,  the  set  generated  by 

$  the  instruction  is  returned.   Since  set  former 

$  instructions  in  SETL  source  programs  will  have  been 

$  expanded  into  series  of  instructions,  a  number  of 

$  instructions  must  be  examined  to  detect  set  formers.   This 

$  routine  is  therefore  compiler  dependent  and  must  be 

$  updated  whenever  the  compiler  is  modified.   If  the 

$  compiler  could  leave  an  explicit  syntatic  indication  of 
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$  set  former  instruction,  this  routine  could  then  be 

$  replaced  by  a  macro  which  simply  detects  the  indicator. 

$  This  routine  is  called  by  the  routine  ID-BASE. 

S  The  global  variables  referenced  by  this  routine  include 

$      BFROlKOIl    -  occurrences  to  which  01  is  directly 

$  -  linked 

$      FFROn{OI}    -  occurrences  which  are  directly 

$  -  linked  to  01 

S  The  macros  used  in  this  routine  include 

$      OFROMI(OI)   -  the  ovariable  of  the  instruction 

S  -  containing  the  ivariable  01 

S  The  local  variables  defined  in  this  routine  are 

repr 

01,  WOI,  NOI  :  eoi-BASE  ;   $  variable  occurrences 

WORK  :  set(€0I-BASE)  ;       $  temporary  set  of  occurrences 

end  repr  ; 

$  If  01  is  an  argument  of  a  set  former  then  it  will  link  to 
$  and  only  to  an  occurrence  which  is  pushed  into  a  stack 
$  which  is  then  subject  to  a  el-SETl  operation.   The 
$  following  code  detects  this  case. 

*  If  01  links  to  more  than  one  occurrence  we  do  not  have  the 
$  configuration  that  we  are  looking  for. 

if  tCUORK  :=  {UOI,  I  - , UOI ] eFFRON { 01 }  I  «BFR0n{0I}  =  1  }  )  =  1  then 
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$  WOI  is  the  occurrence  linked  to  01. 

WOI  :=  arb  WORK  ; 

$    WOI    should    be    pusl\ed    into    a    stack. 

if    OI-OPCWOD/^ei-PUSH    then    return    OM    ; 
else 

$  NOI  is  the  stack. 

NOI  :=  OFROni(WOI)  ; 

if  #(WORK:  =  {WOI, [ -,WOI  lcFFROn{OI}  I  #BFROM { 01 }  =  1  }  )  =  1  then 

$  The  stack  should  be  subject  to  a  ei-SET1 
$  operation. 

WOI  :=  arb  WORK  ; 

if  OI-OP(WOI)  /=  elesETl  then  return  OM  ; 

else  return  OFROMKWOI)  ; 
end  if  OI-OP  ; 

else 

return  CM  ; 
end  if  #  ; 

end  if  OI-OP  ; 
else 

return  OM  ; 
end  if  #  ; 


end  proc  SETOF  ; 


246 


proc  MAKE-REMOTEtOI)  ; 

$  This  boolean  procedure  determines  whether  an  occurrence  01 
$  should  have  a  remote  representation,  hy    detecting  its 
$  appearance  as  an  argument  of  operations  which  are 
$  particularly  inefficient  for  local  representations. 

$  This  routine  is  called  by  the  routine  REFINE. 

$  The  global  variables  referenced  by  this  routine  include 

$      COPY-FLAGCOI)   -  true  if  01  cannot  be  used  destructively 

$  The  macros  used  in  this  routine  include 

$      OI-OP(OI)  -  opcode  of  the  instruction  which  contains 

i  -    occurrence  01,  see  (m7). 

$      ARGHO(OI)  -  argument  number  of  occurrence  01,  see  (M6) 


$  The  local  variables  defined  in  this  routine  are 


repr 

01  '     eoi-BASE  ; 

OP  :  €OPCODES  ; 
end  repr  ; 


$  variable  occurrence 
$  opcode 


OP  :=  OI-OP(OI)  ; 

$  If  01  is  an  argument  of  an  algebraic  or  boolean  operation 

if  (  OP  e  (el-ADD,  21-SUB,  21-MULT,  21-MOD,  21-INC,  Sl-ES,  el-'NE} 

$  Or  01  is  the  ©variable  of  a  retrieval  operation 

or  (  ARGN0(0I)=1  and  OP  in  {QI-OF,  el-ARB,  ei-FROM}  ) 
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$  Or  01  is  the  input  argument  of  a  set  of  tuple  former 
or  (  ARGN0(0I)=2  and  OP  in  (BI-PUSH,  ei-POPl  ) 

$  Or  01  is  the  input  argument  of  an  incorporation 

$  operation 

or  (  ARGN0C0I)=3  and  OP  in  {Cl-SOF,  el-MITH}  ) 

4  Or  01  cannot  be  used  destructively 
or  COPY-FLAG(OI) 

$  Or  01  is  formal  parameter  or  an  actual  argument  of  a 

*  procedure, 

or  (  OP  in  {21-ARGIN,  ei-ARGOUT}  ) 

then 

$  Then  01  should  have  remote  representation. 

return  TRUE; 
else 

return  FALSE  ; 
end  if  ; 

end  proc  MAKE-REMOTE  ; 
end  AUTO-DSTRUCT  ; 
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APPENDIX  A  :  PRIMITIVE  SETL  OPERATIONS 


Operation  Remnrks 


X  +  Y      integer  and  real  addition,  set  union,  character 

string  and  tuple  concatenation 
X  -  Y      integer  and  real  subtraction,  set  difference 
X  *  Y      integer  and  real  multiplication,  set  intersection, 

tuple  and  character  string  repetition 
X  /  Y      integer  and  real  division,  set  symmetric  difference 
X  //  Y     arithmetic  remainder  function 
S  **  Y     arithmetic  exponentiation 
X  and  Y    boolean  and 
X  or  Y     boolean  or 
X  implies  Y    boolean  implies 
not  X      boolean  negation 
S  =  Y      equality  comparison 
X  /=  Y     inequality  comparison 
X  >  y,  X  <  Y  etc  . 

arithmetic  comparisons 
X  : =  Y     simple  assignment 

#x         cardinality  of  set,  lenth  of  tuple  and  string 
arb  X      select  arbitrary  element  of  set 
X  uith  Y     set  extension,  equivalent  to  X  +  {Y} 
X  less  Y     set  contraction,  equivalent  to  X  -  {Y} 
X  in  Y,  X  notin  Y 

set  membership  tests 
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Operation  Remarks 


X  incs  Y 

set  inclusion  test 
pow(X)     power  set 

{X,Y,...}  set  with  specified  elements 

[K,y,...l  tuple  with  specified  elements 

F(X)       function  or  subprocedure  call,  indexing  to  component 
of  tuple  or  string.   If  F  is  a  map,  F(X)  is  the 
unique  Y  such  that  (X,Yl  in  F,  if  such  exists  ; 
otherwise  F(X)  is  undefined. 

F(X1 , . . ,Xn) 

function  or  subprocedure  call.   If  F  is  a  set, 
F(X1,..,Xn)  is  the  unique  Y  such  that 
[XI,.., Xn]  in  F,  if  such  exists  ;  otherwise 
F(X1,..,Xn)  is  undefined. 

F{X1 , . .Xn} 

F  must  be  a  map  ;  F{X1,..,Xn}  is  the  set  of  all 

Y  such  that   [X1,..,Xn,Y)  in  F. 
F[X1, . , ,Xnl 

F  must  be  a  map  ;  F[X1,..Xnl  is  the  set  of  all 

Y  for  which  there  exist  Z 1 , . . Zn  with  Z1  in  XI, 
. . . , Zn  in  Xn  and  (Z1,..,Zn,Yl  in  F. 

F(x:y)     extract  subpart  of  length  Y  starting  at  component 

X  of  string  or  tuple  F. 
F(X):=Y    assignment  operation  corresponding  to  retrieval 

operator  F( X ) . 
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Operation  Remarks 


F(X1 , . .Xn)  :=  Y  ; 

assignment  operator  corresponding  to  retrieval 
operator  F(X1,..,Xn). 
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APPENDIX  B  :  ALPHABETICAL  LISTIKG  OF  GLOBAL  NAMES  REFERENCED 

ALL-,!  -  variable,  see  (VIO). 

;VI.I,,0  -  variable,  see  (V9). 

ALL-OI  ~  variable,  see  (V8). 

ARGNO  ~  macro,  see  (M6). 

j^RGS  ~  variable,  see  (V22). 

ARGI  ~  macro,  see  (Ml). 

ARG2  ~  macro,  see  (112). 

ARG3  ~  macro,  see  (M3). 

BASE-ELMT  -  variable,  see  (V27). 

BASENAM  -  macro,  see  (M2H). 

BASING-PROP  -  procedure,  see  (P24). 

BFROM  ~  variable,  see  (VII). 

BLOCK-BASE  -  base,  see  (V17). 


BLOCKOF 
BMODE 


-  variable,  see  (V20). 

-  variable,  see  (V29). 
CAN-DROP          -  macro,  see  (M32). 
COMBASE           -  macro,  see  (MBl). 
COMPTYP          -  macro,  see  (M19). 
CONSTR-PS-CRTHIS  -  procedure,  see  (P2). 
CTYPN              ~  macro,  see  (M20). 
DOMBASE          -  macro,  see  (N29). 
DOtlTYP           ~  macro,  see  (n22). 
ELMBASE           -  macro,  see  (M28). 
£2UIV            ~  procedure,  see  ( P 1 4  ) 
ERROR-PATH       "  constant,  see  (C8). 
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FFROM 

FORALLCODE 

GENBASE 

GENLOCS 

GLTYP 

GMAP 

GROSSTYP 

GSET 

HASH-USE 

ID-BASE 

ID-TO-BASE 

IFROMO 

INSERTLOCS 

INSTNO 

INSTRS 

IS-COHST 

IS-FORMAL 

IS-GLOB 

IS-HASHED 

IS-IVAR 

IS-OVAR 

IS-PRIM 

KNT 

LASTCALL 

LENTYP 

LIVEPDS 

LOCAL 


variable,  see  (V12). 
macro ,  see  (N4 ) . 
procedure,  see  (PI), 
procedure,  see  (P3). 
macro ,  see  (ri27  )  . 
constant,  see  (C20). 
macro ,  see  (M18 ) . 
constant,  see  (C21). 
variable,  see  (V33). 
procedure,  see  (P26) 
variable,  see  (V35). 
macro ,  see  ( M 1 4 ) . 
procedure,  see  (P13) 
macro ,  see  ( MS ) . 
base ,  see  ( V  1  6  )  . 
variable,  see  (74). 
variable,  see  (V24). 
variable,  see  (V5). 
macro,  see  (M13). 
macro ,  see  ( M 1 2 ) . 
macro ,  see  (Mil), 
macro,  see  (1117). 
constant,  see  (CIS), 
procedure,  see  (P18) 
macro ,  see  (M2 1 ) . 
variable,  see  (V32). 
constant,  see  (C41). 
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LPD-BASE  -  base,  see  (V31). 

MAKE-REMOTE  -  procedure,  see  (P28) 

MAPSET  -  constant,  see  (C38). 

MAPTUP  -  constant,  see  (C36). 

MERGE  -  procedure,  see  (P11) 

MERGE-INTO  -  procedure,  see  (P12) 

MERGEOBJ  -  procedure,  see  (PU). 

MGTYP  -  macro,  see  (M26). 

MODE  -  variable,  see  (V30). 

MODE-BASE  -  base,  see  (V28). 

MODECMPRS  -  procedure,  see  (P21) 

MODEDIS  -  procedure,  see  (P16) 

MOVELOCS  -  procedure,  see  (P19) 

NAME  -  variable,  see  (V2). 

NBASEDOK  -  variable,  see  (V26). 

NBASES  -  variable,  see  (V25). 

NEXT  -  variable,  see  (V19). 

NON-HASH-USE     -  variable,  see  (Vi^). 

NULL-PATH  -  constant,  see  (C7). 

OFROMI  -  macro,  see  (M15). 

OI-BASE  -  base,  see  (V6). 

OI-INTOV  -  macro,  see  (M10). 

OI-NAME  -  macro,  see  (M8). 

OI-OP  -  macro,  see  (M7). 

OI-VALUE  -  macro,  see  (M9). 

OPCODE  -  variable,  see  (V21). 

OPCODES  -  base,  see  (V18). 
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OPS-ASN 

OPS-CREATE 

OPS-HASH 

OPS-RETRIEVE 

PARTITION 

PROPELHT 

PROPOFHAP 

PROPOFTUP 

PROPSOFAMAP 

PROPSOFMAP 

PROPSOFTUP 

PS-CRTHIS 

RANBASE 

RANTYP 

RC-BASE 

RC-CALL 

RC-RETN 

REALB 

REFINE 

REMOTE 

REPRATT 

SB-BASE 

SETOF 

SETTUP 

SPARSE 

STRUCTPART 

SUBSTHD 


constant,  see  (CI), 
constant,  see  (C4). 
constant,  see  (C2). 
constant,  see  (C3). 
procedure,  see  (P17). 
procedure,  see  (P5). 
procedure,  see  (P6). 
procedure,  see  (P8). 
procedure,  see  (P10). 
procedure,  see  ( P7 ) . 
procedure,  see  (P9). 
variable,  see  (V13). 
macro ,  see  ( M30 ) . 
macro ,  see  ( n23 ) . 
base ,  see  ( V7  )  . 
constant,  see  (C5). 
constant,  see  (C6). 
procedure,  see  (P15) 
procedure,  see  (P25) 
constant,  see  (C40). 
macro ,  see  (n25  )  . 
base .  see  ( V23)  . 
procedure,  see  (P27) 
constant,  see  (C37). 
constant,  see  (C39). 
macro ,  see  ( M  1  6  )  . 
procedure,  see  (P22) 
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SYMBOLS 

TA 

TBASE 

TC 

TELMT 

TG 

THTUP 

TI 

TL 

TLC 

TLI 

TMAP 

TMTUP 

TNUM 

TOM 

TP 

TR 

TSC 

TSET 

TSI 

TSTRUCT 

TTUP 

TYPE-BASE 

TYPES 

TZ 

TZSTRUCT 

UNT 


base,  see 

(VI  )  . 

constant , 

see  ( 

CIS) 

constant , 

see  ( 

C23) 

constant , 

see  ( 

C2U) 

constant , 

see  ( 

C22) 

constant, 

see 

C32) 

constant. 

see 

'C28) 

constant. 

see 

:C25) 

constant , 

see 

[C16) 

constant. 

see 

[CT4) 

constant. 

see 

cell) 

constant , 

see 

(C31) 

constant , 

see 

(C27) 

constant, 

see 

(C26) 

constant , 

see 

CC9)  . 

constant , 

see 

(C17) 

constant , 

see 

(C12) 

constant. 

see 

(C13) 

constant , 

see 

(C30) 

constant , 

see 

(CIO) 

constant. 

see 

(C34) 

constant , 

see 

(C29) 

base ,  see 

(V14 

)  . 

variable , 

see 

(V15) 

constant , 

see 

(C33) 

constant , 

see 

(C35) 

constant , 

see 

(C19) 
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UPDMODES 

USE-DETERM 

VALUE 


-  procedure,  see  (P20) 

-  procedure,  see  (P23) 

-  variable,  see  (V3). 
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PREFACE 
Abstract 

Formal  differentiation  is  a  program  optimization  technique 
which  generalizes  John  Cocke's  method  of  strength  reduction  and 
provides  a  convenient  framework  with  which  to  implement  a  host 
of  program  transformations   including  Jay  Farley's  'iterator 
inversion'.   This  technique  captures  a  commonly  occurring  yet 
distinctive  mechanism  of  program  construction  in  which  succinct 
algorithms  involving  costly  repeated  calculations  are  transformed 
into  more  efficient  incremental  versions. 

The  basic  formal  idea  of  this  technique  can  be  put  as 
follows.   Suppose  that  an  expression  C  =  f  (x-^ ,  .  .  ,  ,  Xj^)  will  be 
used  repeatedly  in  a   a  program  region  R,  but  that  its  calcula- 
tion cannot  be  moved  outside  R  because  its  paramete.rs  x-^,  .  .  .  ,x^ 
are  modified  within  R.   If  we  make  C  available  on  each  entry  to 
R  (by  calculating  it  before  entry)  and  keep  C  available  within 
R  by  recalculating  it  each  time  one  of  its  parameters  is  modified, 
then  we  may  be  able  to  avoid  all  full  calculations  of  C  within  R. 

For  this  approach  to  be  reasonable  we  require  the  follow- 
ing heuristic  condition  to  hold:   Within  R,  for  each  redefini- 
tion  X-  =  Aj^.  to  a  parameter  x^  ,  there  should  exist  code  which 
can  be   inserted  immediately    before  and  after  the  redefini- 
tion point  p  and  which  serves  to  keep  C  available  within  R. 
We  refer  to  this  inserted  code  as  the  pre    and  post    derivative 
code  of  f  with  respect  to  the  change   x.  =  Ax-;.   We  require 
this  code  as  well  as  all  code  necessary-^ to  maintain  available 
expressions  on  which  it  depends   to  consist  of  "easy"  calcula- 
tions relative  to  the  cost  of  a  fresh  calculation  of  f  (i.e. 
operations  heuristically   less  costly  in  time).   If  this  is 
the  case,  then  we  say  (suggestively  though  only  heuristically 
and,  of  course,  not  in  the  standard  technical  sense)  that  the 
expression   f  is  continuous  in  its  parameters  relative  to  the 
modifications  occurring  within  R.   If  we  cannot  demonstrate 
that  f  is  continuous  with  respect  to  a  particular  parameter 
change,  we  will  say  that  f  is  discontinuous  in  that  parameter 
change . 

Up  until  Farley's  discovery  of  iterator   inversion   the 
preceding  idea  was  applied  at  the  Fortran  level  for  expressions 
continuous  in  all  of  their  parameters.   Application  of  this 
idea  in  a  set  theoretic  context  was  introduced  by  Jay  Farley 
with  his  discovery  of  'iterator  inversion'.   Moreover,  Farley's 
transformations  could  handle  expressions  involved  in 
discontinuity  parameters.   However,  Farley  lacked  an  imple- 
mentation design. 

In  the  present  paper  we  unify  the  technique  of  Cocke  and 
Farley,  and  provide  algorithms  which  can   implement  these 
formal  differentiation  transformations  both  automatically  and 
semiautomatically  for  programming  languages  ranging  from 
Fortran  to  SETL.   However,  we  find  that  success  is  best  achieved 
in  the  case  of  SETL.   Thus,  we  study  set  theoretic  formal 
differentiation  in  depth  and  present  a  comprehensive  semi- 
automatic implementation  design  for  a  restricted  version  of 
SETL  called  Subsetl.   We  show  that  the  expected  speedup  due  to 
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transformations  applied  by  our  proposed  systa~   can  be   as 
great  as  an  order  of  magnitude.   In  particular  we  regard 
differentiation  of  general  set  forirer5 

C  =  {x  G  s  I  K(x,t,,...,t  )}   as  being  of  primary  importance. 

We  estimate  that  the  cost  of  executing  a  calculation  C 
repeatedly  in  a  loop  L  is  proportional  to  N  x  (#S)  x  Cost(K) 
where  N  is  the  iteration  count  of  L.   The  formal 
differentiation  transformations  applied  by  our  system  will 
keep  the  value  of  C  available  in  either  (N  +  #S)  x  Cost(K) 
or  (N  +  (#S)  X  log(#S))  x  Cost(K)   elementary  steps;  and 
this  will  usually  imply  a  speedup. 

VJe  illustrate  our   proposed  system  by  considering  and 
improving  eight  sample  Subsetl  programs.   We  feel  that  these 
initial  case  studies  lend  strong  support  to  further  efforts 
to  fully  automate  and  incorporate  set  theoretic  formal  differ- 
entiation  as  part  of  an  optimizing  compiler. 
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I.   INTRODUCTION 

Continued  development  of  very  high  level  languages 
depends  in  part  on  our  ability  to  recognize  common  major 
aspects  of  programming  style  as  resulting  from  the  applica- 
tion of  some  standard  technique  of  program  improvement  to  an 
underlying  program  prototype.   A  technique  of  program  improve- 
ment that  we  are  able  to  perceive  as  general  can  become  the 
basis  for  a  general  optimization  method;  and  once  this  method 
is  in  hand,  we  can  safely  write  programs  in  relatively  simple 
unoptimized  forms,  since  their  more  complex  optimized  forms 
will  be  seen  as  obvious  improvements,  derivable  mechanically 
or  semimechanically  from  these  simple  forms. 

This  thesis  describes  an  optimization  method,  formal 
differentiation,  which  generalizes  the  classical  method  of 
'reduction  in  operator  strength'.   The  method  captures  a 
commonly  occuring  yet  distinctive  mechanism  of  program 
construction  in  which  succinct  algorithms  involving  costly 
repeated  calculations  are  transformed  into  more  efficient 
incremental  versions.   When  applied  to  set  theoretic  dictions 
as  found  in  a  language  such  as  SETL  (cf.  Appendix  A  for  SETL 
description) ,  this  technique  can   transform  algorithms  from 
high  level  concise  but  inefficient  problem  statements  into 
more  complex  but  efficient  program  versions. 

Much  of  our  work  is  based  on  Jay  Earley's  technique  of 
'iterator  inversion'  which  was  applied  in  the  set  theoretic 
context  of  his  proposed  language  VERS2  [El,E2].   The  current 
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thesis  will  view  'iterator  inversion'  along  with  its 
generalizations  as  a  kind  of  'formal  differentiation' 
of  algorithms.   Although  the  idea  of  formal  differentiation 
can  be  applied  independently  of  semantic  level,  it  is 
particularly  well  suited  to  very  high  level  languages. 
Thus,  we  will   describe  an  implementation  design  for  an 
interactive  semiautomatic  system  which  would  facilitate  the 
application  of  this  technique  to  algorithms  written  in 
SETL.   We  give  pragmatic  rules  for  the  recognition  and  treat- 
ment of  reasonably  general  cases  in  which  this  optimization 
is  applicable,  and  consider  some  of  the  problems  which  arise 
in  actually  attempting  to  install  this  optimization  as  part 
of  a  compiling  system. 

Before  presenting  a  formal  description  of  our  method, 
we  trace  through  its  origins.   Historically,  similar  trans- 
formations have  been  used  in  numerical  techniques  to  tabulate 
values  of  mathematical  functions;  e.g.,  using  the  compound 
growth  formula, 

(1)  f{t)  =  P  *  (1  +  I)^ 

for  a  given  initial  value  P  and  growth  rate  I,  we  can 
calculate  the  sequence  f (0)  , f (1)  , f  (2)  , .  .  .  by  first  computing 
f(0)  =  P  and  then  generating  each  successive  entry  f(t) 
from  the  previous  entry  f(t  -  1)  by  applying  the  identity 
f(t)  -    f(t  -  1)  *  (1  +  I) .   The  computational  cost  of  this 
method  is  usually  much  less  than  the   cost  of  repeated 
calculations  of  (1) . 
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Charles  Babbage ' s  analytic  difference  engine  further 
illustrates  this  idea  [Gl].   Babbage ' s  early  computer  could 
perform  only  one  arithmetical  operation,  addition,  but  by 
use  of  difference  polynomials  he  could  program  his  machine 
to  calculate  tables  for  polynomials  efficiently.  We  can  see 
how  to  do  this  by  noting  that  for  a  given  polynomial  p(x) 
of  degree  n  and  an  increment  A  ,  the  first  difference  poly- 
nomial P-,  (x)  =  p(x  +  A)  -  p(x)  is  of  degree  n-1  or  less,  the 
second  difference  polynomial  P2  (x)  =  p.,  (x  +  A)  -  p-|^(x)  is 
of  degree  n-2  or  less,  ...,  and  finally  P„(x)  must  be  a 
constant.   Thus,  in  order  to  generate  a  sequence  of  polynomial 
values   p(Xq),  p(Xq  +  A),  p(Xq+  2A) ,  ...,  we  can  do  the 
following : 

1.  Calculate  initial  values  for  p (Xq) ,p, (x^) , . . .p  (Xq) 
which  are  stored  in  components  t (1) , t ( 2) , . . . , t (n+1)  of  an 
n+] -tuple  t. 

2.  Generate  the  desired  polynomial  table  by  iterating 

over  the  code  block  below, 

PRINT  X,  t(l);  /*  PRINT  x  and  p(x)  */ 

X  :=  X  +  A;  /*  increment  x  */ 

t(l)  :=  t{l)+t(2);  /*  place  new  values  for  */ 

t(2)  :=t(2)+t(3);  /*  p  (x)  ,p^  (x)  ,  .  .  .  ,  p^_  ^^  (x)  into*/ 

t(n)':=  t(n)+t  (n+1)  ;  /*  t  ( 1)  ,  t  (  2)  ,  .  .  .  ,  t  (n  )  */ 

In  the  1960 's  John  Cocke  emphasized   the  general  signi- 
ficance of  an  optimization  method  he  called  'reduction  in 
operator  strength'  which  incorporates  the  ideas  just 
mentioned  and  applies  them  to  Fortran  level  code  [SchV]. 
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Cocke's  original  techniques  have  since  been  generalized  and 
implemented  with  various  improvements  (for  which  see  Al ,  Cl, 
C2,  K1-K5) .    We  illustrate  his  method  with  the  following 
simple  example.   Suppose  that  an  expression  i  *  c  occurring 
in  a  strongly  connected  region  R  cannot  be  moved  out  of  R 
because   of  redefinitions   to  i.  (We  assume  here  that  c  is 
a  region  constant  of  R.)   Suppose  also  that  the  variable  i 
is  defined  before  each  entry  to  R  and  that  all  redefinitions 
to  i  within  R  are  of  the  form  i  =  i  +  A  v/here  A  is  a  region 
constant  of  R.   Then  we  can  use  the  following  idea  to  move 
all  calculations  of  i  *  c  out  of  R.   Since  i  is  defined  on 
entrance  to  R,  we  can  insert  an  assignment  T  =  i  *  c  to  a 
unique  compiler  generated  variable  T  just  prior  to  each 
entry  point  of  R.   Within  R  immediately  before  each  redefi- 
nition i  =  i  +  A  to  i  we  can  preserve  the  value  of  i  *  c  in  T 
by  executing  the  update  assignment  T  =  T  +  A  *  c  (whose  form 
follows  from  the  distributive  law) .   Note  that  A  *  c  is 
invariant,  and  its  calculation  can  be  moved  out  of  R.  Finally, 
we  see  that  all  calculations  of  i  *  c  are  redundant  in  R 
and  can  be  replaced  by  uses  of  T. 

A  slightly  more  complicated  example  will  serve  to 
illustrate  the  deeper  problems  which  can  arise  in  applying 
reduction  in  strength  when  data  values  must  be  traced  from 
one  variable  to  another.   Consider  the  last  example  once 
more,  but  now  permit  redefinitions  to  i  within  R  of  the 
forms  i=-k,  i=-k+£  and  i  =  k  +  £   where  k  and  i 
are  either  region  constants  or  variables.  In  such  a  case 
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we  can  sometimes  still  reduce  the  multiplication  i  *  c  to 
successive  additions  and  copy  operations.   After  inserting 
the  initial  assignments  T  =  i  *  c ,  as  before,  we  can  keep 
the  value  of  i  *  c  current  in  T  by  inserting  the  following 
code   just  prior  to  each  redefinition  to  i , 

redefinition  update  code 

i  =  -k  T  =  -k*c 

i  =  -k  +  £  T  =  -k*c+)l*c 

i  =  k+    £  T  =  k*c+£*c 

Any  product  i  *  c  or  c  *  i  introduced  as  part  of  the  update 
code  can  be  replaced  by  T.   All  region  constant  products  can 
be  moved  out  of  R.   Common  subexpressions  introduced  by  this 
transformation  can  be  eliminated.  Finally,  for  i  *  c  to  be 
reduced  in  strength,  all  remaining  products  occurring  within 
the  update  code  must  be  reducible  in  strength.   If  this 
condition  is  satisfied  and  if  the  time  cost  of  the  addition 
operations  inserted  into  R  by  strength  reduction  is  less 
than  the  cost  of  the  multiplications  i  *  c  removed  from  the 
original   text,  then  a  constant  factor  improvement  in  running 
time  should  be  obtained. 

The  program  transformations  described  above  represent 
a  most  basic  kind  of  'formal  differentiation'  which  we 
believe  is   a  term  more  descriptive  of  the  process  being 
applied  than  Cocke's  term  'reduction  in  strength'.  Note, 
also,  that  the  phrase,  'reduction  in  strength'  has  been 
applied  to  other  optimization  techniques  which  replace  a 
costly  operation  with  a  less   expensive  one  (e.g.,  peephole 
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optimizations  such  as  the  string  concatenation  removal 
length  (STRINGl  II  STRING2)  =*  length  (STRINGl)  +  length  (STRING2)  ) 
For  systematic  discussion  of  the  notion  of  formal  differ- 
entiation, it  is  convenient  to  introduce  some  definitions 
and  notational  devices.   We  will  sometimes  use  the  notation 
C  =  f(x, ,...,x  )  to  associate  a  text  expression  f  with  a 
unique  compiler  generated  variable  C.   We  will  assume  that 
whenever  f  is  executed,  its  value  calculated  from  the  values 
of  its  free  variables  x  , . . . ,x   and  constants,  is  placed  in  C. 

in  '      r- 

We  will  say  that  C  is  available    on   exit    from  a  program  point 
p  if  C  is  equal  to  the  value  which  the  expression  f  would 
have  if  evaluated  immediately  after  the  statement  at  p  is 
executed;  C  is  available    on   entvanoe    to  p  if  C  is  available 
on  exit  from  all  predecessor  points  of  p.  If  C  is  available 
on  entrance  to  p,  and  if  C  is  not  available  on  exit  from  p 
(which  will  happen  when  execution  of  the  statement  at  p 
changes  the  value  of  a  parameter  x.  upon  which  the  value  of 
f  depends),  then  we  say  that  C  is  spoiled   at  p.   C  is  avail- 
able on  entrance  to  a  program  region  R   if  it  is  available 
on  entrance  to  each  entry  point  of  R. 

As  Schwartz  notes  in  [C2],  reduction  in  strength  (which 
we  shall  call  formal  differentiation)  is  a  very  general  and 
powerful  optimization  method  applicable  to  a  wide  class  of 
operations.  The  basic  formal  idea  of  this  technique  can  be 
put  as  follows.  Suppose  that  an  expression  C  =  f  (x^^ ,  .  .  .  ,x^) 
will  be  used  repeatedly  in  a  program  region  R,  but  that  its 
calculation  cannot  be  moved  outside  R  because  its  parameters 
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X, ,...,x   are  modified  within  R.  If  we  make  C  available  on 
In 

each  entry  to  R  (by  calculating  it   before  entry)  and  keep 
C  available  within  R  by  recalculating  it  each  time  one  of 
its  parameters  is  modified,  then  we  may  be  able  to  avoid 
all  full  calculations  of  C  within  R. 

For  this  approach  to  be  reasonable,  there  must  be  some 
way  of  vecalaulating   C  more  easily  after  its  parameters  are 
modified  than  by  calculating  C  afresh  each  time  it  is  required. 
For  this  to  be   the  case,  we  are  likely  to  require  three 
conditions  which  can  be  stated  heuristically  as  follows: 

(a)  Each  free  variable   x  ,...,x  on  which  f  depends  must 
be  defined  on  entrance  to  R.   This  insures  that  initial 
calculations  to  make  C  available  on  entrance  to  R  are   possible, 

(b)  Within  R,  for  each  redefinition  x.  =  A    to  a  parameter 

D     X. 

X.  ,  there  should  exist  code  which  can  be  inserted  immedi- 
ately before  and  after  the   redefinition  point  p  and  which 
serves  to  keep  C  available  within  R   (except  possibly  at 
points  of  the  update  code  itself) .   We  refer  to  this  inserted 
code  as  the  pre  and  post  derivative  code  of  f  with  respect  to 

the  change  x.  :=  A   .   We  require  this  code  as  well  as  all 

-^  1 

code  necessary  to  maintain  available  expressions  on  which 

it  depends  to  consist  of  'easy'  calculations  relative  to  f 

(i.e.  operations  heuristically  less  costly  in  time).  If 

this  is  the  case,  then  we  say  (suggestively  though  only 

heuristically  and,  of  course,  not  in  the  standard  technical 

sense)  that  the  expression  f  is  continuous  in  its  parameters 

relative  to  the  modifications  occuring  within  R.   If  we  cannot 
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demonstrate  that  f  is  continuous  with  respect  to  a  particular 
parameter  change,  we  will  say  that  f  is  discontinuous 
in  that  parameter  change. 

(c)   No  modification  to  an  argument  of  f(x, ,...,x  )  may 
occur  in  a  strongly  connected  subregion  Q  of  R  since  it  is 
likely  that  any  update  code  inserted  into  Q  would  be  executed 
much  more  frequently  than  at  program  points  in  R  -  Q,  and 
hence  could  cost  much  more  in  running  time  than  the  original 
calculation  f (x,  ,  .  . . ,x  )  . 

An  implementation  method  for  formal  differentiation 
must  include  the  following  steps:   1.  find  reduction  candidates; 
2.  test  for  the  enabling  conditions  above;  3.  provide  rules 
for  generating  pre  and  post  derivatives;  4.  transform  the 
original  code  by  successive  applications  of  the  strength 
reduction  method,  possibly  in  several  ways;  5.  select  the 
most  profitable  of  the  transformed  program  versions. 

Using  the  general  framework  above  it  is  possible  to 
methodically  reduce  classes  of   expressions  built  up  from 
rather  complicated  operations  and  data. 

Although  Earley  was  the  first  to  describe  formal  differ- 
entiation in  a  set  theoretic  context  [El] ,  an  implementation 
design  for  set  expressions  was  first  provided  by  Fong  and 
Oilman  [Fl,F2] .   They  propose  to  reduce  binary  set  operations 
such  as   union,  intersection,  and  set  difference   as  well  as 
more  complicated  expressions.   They  provide  a  straightforward 
mechanism  for  detecting  and  reducing  such  expressions,  and 
within  their  model  they  predict  possible  program  speedup  by 
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as  much  as  an  order  of  magnitude,  while  also  guaranteeing 
a  constant  bounded  space  and  work  overhead.   Instead  of 
maintaining  the  full  value  of  a  candidate  expression  within 
a  program  region  R,  their  method  keeps  the  small  variation 
in  value  of  this  expression  available  in  R.   They  emphasize 
that  aside  from  the  nature  of  specific  operations,  the 
control  flow  structure  is  an  important  factor  in  determining 
reduction  in  strength  capabilities.   Drawing  on  control  flow 
considerations,  they  demonstrate  that  certain  'induction' 
variables  on  which  reduction  candidate  expressions  depend 
may  be  redefined  in  R  by  reassignment  to  'induction'  expres- 
sions as  well  as  by  differential  modifications. 

However,  the  work  of  Jay  Earley  is  of  central  importance 
to  the  present  thesis.   In  [El]  he  describes  an  interesting 
data  choice  optimization  for  the  high  level  set  theoretical 
language  VERS2  [E2] .   Earley  notes  that  his  optimizations 
will  apply  to  any  language  of  about  the  same  level  as  VERS2, 
and  since  SETL  is  roughly  of  this  level,  SETL  dictions 
define  a  convenient  context  for  study  of  the  Earley  optimi- 
zations, cf.  [El,Sl,Schl-6] .   Earley  calls  his  optimization 
method  'iterator  inversion',   and  thinks  of  it  as  a  way  of 
automatically  choosing  a  representation  of  a  set  or  sequence 
being  iterated  over,  which  minimizes  the  number  of  opera- 
tions necessary  for  a  VERS2  iterator  to  produce  its  stream 
of  values.   Ideally,  iterator  inversion  will  produce  an 
iterator's  stream  of  values  (in  the  correct  order)  directly. 
For  example,  an  iterator  x  e  s  |  K(x)   produces  the  stream 
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of  all  elements  in  the  set  s  satisfying  the  Boolean  sub- 
expression K(x).   Iterator  inversion,  in  Earley's  sense, 
will  replace  an  iterator  x  e  s  |  K(x)  by  a  simpler  iterator 
X  G  s'  where  s'  represents  the  set  {x  s  s  |  K(x) }.  (Note 
that  s  must  not  be  free  in  K.)   The  iterator  x  e  s'  avoids 
the  computation  of  K(x)  for  all  elements  x  in  s .  Earley 
shows  how  to  keep  s'  available  in  a  program  region  R  by 
means  of  incremental  update  rules,  e.g.,  just  before  a 
slight  change  s  :=  s  u  {a}  which  adds  the  element  a  to  s 
we   can  execute  the  following  code, 

if  K(a)  then  s':=  s'u  {a}. 
He  also  discusses  setformer  expressions 

(2)  {x  e  s  I  f (x)  =  q} 

whose  value  is  the  set  of  all  elements  x  of    the  set  s 
in  which  the  value  of  the  map  f  applied  to  the  bound 
variable  x  equals  the  free  variable  q.   To  handle  (2)  effi- 
ciently, he  constructs  the  map   T(q)  -    {x  ^    s    |  f(x)  =  q} 
defined  over  all  values  that  q  can  have  in  a  program  region 
R  containing  ( 2) .   He  indicates  that  the  entire  map  T  can 
be  kept  available  in  R  whenever  all  changes  to  s  in  R  are 
only  element  additions  or  deletions  and  all  redefinitions 
of  the  map  f  change  only  one  range  value  of  f  at  a  particular 
domain  point.   We  will  not  describe  his  rules  fully,  since 
in  the  next  chapter   we  intend  to  describe  our  own  similar 
but  more  powerful  transformations  in  a  formal  style. 

In  exploring  a  way  to  fit  Earley's  techniques  into  a 
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general  framework  ,   Schwartz  noted  that  to  reduce  expressions 
such  as  (2J  ,  it  is  necessary  to  deal  with  parameters  such  as 
q  upon  which  expressions  do  not  depend  continuously  [Sch  7J . 
The  present  thesis  develops  the  concept  of  formal  differen- 
tiation further  and  applies  it  to  a  wider  class  of  expres- 
sions under  broader  conditions  than  have  been  considered. 
In  rudimentary  form  our  idea  which  extends  the  basic 
notions  previously  described  (cf.  p-278)  may  be  stated  in 
the  following  way.   We  consider  as  reduction  candidates 
those  expressions 

(3)  C  =  f  (Xj_,  ...,x^) 

in  a  strongly  connected  region  R  which  are  continuous  with 

respect  to  redefinitions  to  some  of  their  parameters  x, ,...,x, 

and  discontinuous  relative  to  changes  to  the  others, 

X,  ,,,..., X  .   Our  formal  differentiation  method  draws  on  the 
k+l      n 

fact  that  if  the  final  group  of  parameters  is  given  constant 
values,  then  the  expression  f  =  f (x, , . . , ,x,  ,q,  , , . . . ,q  ) 
is  continuous  in  its  remaining  parameters.   To  reduce  (3) 
we  can  therefore  use  a  'memo'  function  C  which   keeps  several 
values  C(q^^^,  .  .  .  ,q^)    =    f  (x^  ,  .  .  .  ,  Xj^,qj^^^  ,  .  .  .  ,q^)  of  f  avail- 
able in  R.   This  makes  it  possible  to  avoid  redundant 
calculations  of  (3)  by  replacing  each  such  use  by  a  retrieval 
operation  C(x,^,,...,x  ).   The  actual  reduction  transformation 
is  sketched  below: 

i.    On  entrance  to  R  initialize  the  mapping  C  by  performing 
C  :=  nutlset ; 
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ii.   Whenever  any  argument  x, ,...,x,  is  varied  inside  R, 

then  the  pre  and  post  derivative  operations  necessary  to 

keep  each  stored  value  C(q,  ,,..., q  )  available  in  R  must 

be  performed. 

iii.  When  one  of  the  variables  x,  ,-,,...  ,x   changes  inside  R, 

k+1       n      ^ 

no  calculations  need  be  made. 

iv.   Replace  each  calculation  f(x, ,...,x  )  in  R  by  code 

which  either  retrieves  a  stored  value  C(x,  ,,..., x  ) 

k+1      n 

(if  [x,  ,,,..., X  ]  is  in  the  domain  of  C)  or  else  calculates 
k+1       n 

f(x, ,...,x  )  and  records  this  value  in  C.   This  code  is 
roughly  as  follows: 

(4)   if    [Xj^^-^,  .  .  .,x^]  e    PROJECT  (n-k,C) 

/*  PROJECT  returns  the  domain  of  C  */ 

then    C  (x,  ,,,...,  x  ) 
k+1       n 

else   C  (Xj^_|_^,  .  .  .  ,x^)  :=  f(x^,...,x^) 

/*  use  of  an  assignment  side  effect  within 

an  expression  */ 

The  PROJECT  function  used  in  (4)  is  a  projection  operator 
on  maps,  and  has  the  following  simple  SETL  definition: 

DEFINEF  PROJECT(m,  MAP) ; 
/*  if  MAP  is  an  n-ary  SETL  map  viewed  as  a  set  of  n+1-tuples, 
then  when  m  <_   n+1,  PROJECT  returns  the  set  of  m- tuples 
matching  the  first  m  component  values  of  n+1-tuples  in 
MAP  */ 
RETURN  {x(l:m),  x  e  MAP } ; 
END; 
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The  approach  sketched  above  accepts  the  expense  of 
map  retrieval  operations  (iv)  and  incremental  update  code  (ii) 
in  order  to  eliminate  redundant  calculations  of  (3)  in  R. 
Note,  however,  that  when  the  domain  of  C  is  large,  then  the 
cost  of  performing  pre  and  post  derivatives  of  f  with 
respect  to  changes  x.  :=  A    (for  j  <^  k)  may  make  formal 
differentiation  unprofitable.   For  those  fortunate  cases 
in  which  the  domain  of   C  is  small,  or  when  we  can  guarantee 
that  the  derivative  calculations  can  be  limited  to  a  small 
portion  of  the  domain  of  C,  we  will  say  that  the  disconti- 
nuity parameters   x,  ,,,... ,x   of  (3)  are  'removable'  and 
-^  '^  k+1      n 

that  the  map  C  is  continuous  in  the  continuity  parameters 
of  f  relative  to  the  modifications  occurring  in  R. 

Chapter  two  develops  the  ideas  outlined  in   the 
preceding  discussion,  and  gives  numerous  examples  and  case 
studies  of  their  application.   Chapter  three  sketches  an 
implementation  design  for  a  basic  formal  differentiation 
system  which  only  handles  expressions  continuous  in  all 
of  their  parameters.   Chapter  four  extends  the  initial 
design  to  one  which  can  handle  most  of  the  examples  presented 
in  Chapter  two  that  are  not  handled  by  the  simpler  system. 
Chapter  four  also  considers  various  time  and  space  improve- 
ments which  could  be  incorporated  in  this  extended  design, 
and  it  concludes  with  some  remarks  proposing  directions  in 
future  research. 
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II.   BASIC  CONCEPTS  AND  SET  THEORETIC   EXAMPLES 

A.    Preliminaries 

Application  of  the  idea  of  formal  differentiation  in 
a  set-theoretic  context  was  initiated  by  Earley  and  has 
been  pushed  further  by  Pong  and  Ullman  [Fl]  who  made  the 
interesting  observation  that  formal  differentiation  in  a 
set-theoretic  milieu  could  actually  improve  the  asymptotic 
behavior  of  an  algorithm  and  that  this  fact  could  be  used 
to  develop  a  theoretical   characteriziation  of  the  situa- 
tions in  which  this  technique  applied.   In  the  discussion 
which  follows,  we  shall  pursue  Earley ' s  idea  in  a  less 
formal  sense  than  that  of  Fong  and  Ullman,  aiming  to  state 
pragmatic  rules  for  the  discovery  and  treatment  of  reasonably 
general  cases  in  which  formal  differentiation   can  be 
applied.    (Of  course,  any  reasonable  criterion  for  evaluat- 
ing  the  utility  of  formal  differentiation  must  rest  on  some 
notion    of  expected  efficiency,  albeit  only  a  heuristic  one. 
Such  an  informal  complexity  measure  must  at  the  very  least 
distinguish  between  'easy'  and  'complicated'  operators.) 

In  this  chapter,  we  will  be  concerned  principally 
with  general  sets  and  with  mappings  represented  by  sets  of 
tuples.   Unless  otherwise  specified,  we  assume  a  hash  table 
implementation  for  sets  in  which  entries  are  linked  within 
a  two  way  list,  thus  permitting  a  unit  time  membership  test 
and  a  linear  time  search  through  sets.   Moreover,  if  element 
addition  and  deletion  can  be  performed  directly  on  the  body 
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of  a  set  without  copying,  then  these  operations  can  also 
be  done  in  unit  time.   We  also  assume  a  similar  hashed 
implementation  for  maps  in  which  various  kinds  of  functional 
application  and  change  are   done  in  time  proportional  to  a 
map's  arity,  and  iteration  through  a  map's  domain  takes 
linear  time  [AHUl,  DGSl ,  PI,  Schl] .   In  particular,  the 
data  structure  for  SETL  maps  described  in  DGSl  correspond 
to  our  assumptions. 

B.    Initial  Examples 

In  SETL  the  computations  s  :=  s  +  {x}  and  s  :=  s  -  {x} 
respectively  add   and  delete  the  value  of  the  element  x 
from  the  set  s.   Both  operations  change  s  only  'slightly'. 
Similarly,  if  s  and  A  are  sets  and  the  number  of  elements 
of  A,  #A,  is  much  smaller  than  #s,  then  modifications  of 
the  form  s  =  s  +  A,  represent  'slight'  changes   to  s. 
We  expect  that  such  changes  can  be  performed  destructively 
at  a  cost  proportional  to  #A  by  the  obvious  technique  which 
is  written  in  SETL  as  follows: 

(Vx  €  A)  /*  linear  time  search  through  A  */ 

s  :=  s  +  {x};;      /*  destructive  unit  time  assignment*/ 
If  f  is  a  set  of  pairs  used  as  a  SETL  mapping,  then  the 
operation   f(x)  =  z,  which  replaces  all  pairs  whose  first 
element  is  x  by  the  pair  [x,z]  causes  f  to  change  slightly. 
If  f  is  a  set  of  (n+1) -tuples  used  as  a  multiparameter 
mapping,  then  the  indexed  assignment  f(x, ,...,x  )  :=  z 
alters  f  only  slightly. 
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The  informal  notion  of   'slight'  changes  to  a  set 
can  be  used  to  illustrate  the  notion  of  expression 
'continuity'.   Examples  of  SETL  expressions  continuous  in 
differential  changes  to  all  of  their  parameters  are:  set 
union  s  +t,set  intersection,  s  *  t,  and  set  difference  s  -  t, 
Consider  the  set  difference  operation 

(1)  C  =  s  -  t  . 

If  the  value  currently  available  for  C  is  spoiled  by  a 
differential  change  s  :=  s  +  A  to  s  at  a  program  point  p, 
the  value  C  of  s  -  t  can  be  restored  on  exit  from  p  by 
executing  the  corresponding  update  code. 

(2)  C  :=  C  +  (A  -  t)  ,    or    C  :=  C  -  A 

immediately  prior  to  p .   We  will  say  concisely  that  the 
update  code  (2)  gives  a  'prederivative '  of  the  expression 
(1)  with  respect  to  the  change  s  :=  s  +  A  or  s  :=  s  -  A. 
Both  redefinitions  in  (2)  are  slight  changes  to  C  and  can 
usually  be  performed  at  less  expense  than  the  full  calcula- 
tion (1) .   Note  that  if  (1)  is  performed  in  the  obvious  way, 
we  can  expect  its  running  time  to  be  proportional  to  #s, 
while  the  execution  of  each  slight  change  in  (2)  will 
require  time  °c  #a  as  realized  by  the  following  obvious 
implementation : 

(3)  (Vx  G  A) 

if  X  ^  t  then    C  :=  C  +  {x};  ENDIF; 
END^; 
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similarly,  we  see  that  (1)  is  continuous  with  respect  to 
small  changes  t  :=  t  +  A.   The  prederivative  code  is  simply 

(4)  C  :=  C  -  A   or   C  :=  C  +  (A  *  s)  . 

These  are  both  slight  modifications  to  C  and  can  be  performed 
in  0(#A)  expected  steps. 

Next  consider  the  set  union  operation 

(5)  C  =  s  +  t 

in  which  s  and  t  can  have  overlapping  values.  The  obvious 

implementation  of  this  will  run  in  0(#s  +  #t)  time.   The 

prederivatives  of  (5)  with  respect  to  changes  s  :=  s  +  A 
are 

(6)  C  :=  C  +  A     or     C  :=  C  -  (A  -  t) 

each  of  which  will  execute  in  time  0(#A) .   Set  intersection, 

(7)  C  =  s  *  t 

has  an  obvious  implementation  requiring  0(#s)  steps.   The 
prederivatives  of  (7)  with  respect  to  the  slight  changes 
s  :=  s  +  A  are 

(8)  C  :=  C  +  A  *  t   or   C  :=  C  -  A 

each  of  which  has  been  seen  to  run  in  0(#A)  steps. 

If  f  is  a  1-ary  function,  then  the  SETL  range  function 
(range  of  f  on  a  set  s) ,  written  as 

(9)  C  =  f  [s] 

is  continuous  with  respect  to  s  :=  s  +  A,  and  its 
prederivative  code  is: 
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(10)  C  :=  C  +  f[A]  . 

The  obvious  implementation  of  (9)  would  take  0(#s)  element- 
ary  steps  in  contrast  to  the  approximately  0{#A)  steps 
needed  to  perform  (10).   Note,  however,  that  (9)  is  discon- 
tinuous with  respect  to  s  :=  s  -  A  and  with  respect  to 
indexed  assignments  such  as  f (y)  :=  z  . 

The  inverse  image  f   [s]  can  be  written  in  SETL  using 
a  set  former  expression 

(11)  C  =  {x  G  DOM    f  I  f(x)  e  s} 

where  DOM   f  is  the  set  of  first  components  of  pairs  in  f 
(if  f  has  an  arity  of  one  then  DOM   f  is  the  domain  of  f ) . 
Expression  (11)  is  continuous  in  f,  and  for  changes 
f (x)  :=  y,   its  prederivative  code  is 

(12)  C  :=  C  -  (if  f(x)  e  s  then  {x}  else  nullset) 

+  (if  y  G  s  then  {x}  else  nullset) . 

The  expected  computational  cost  of  an  assignment  (11) 
is  0(#f).   Clearly  the  prederivative  (12)  should  execute 
in  essentially  constant  time.   Note,  however  that  the  C 
of  (11)  is  discontinuous  in  s. 

The  operation  f[s]  may  be  continuous  in  s  :=  s  +  A 
even  if  f  is  a  programmed  function.   The  conditional  expres- 
sion 'if  a  then  s   else  s^ '  is  continuous  in  s^  and  s^  , 
but  is,  of  course,  discontinuous  in  its  boolean  parameter  a. 

As  Earley  has  emphasized,  expressions  involving  set- 
formers  provide  more  interesting  examples  of  this  phenomenon 
of  'continuity' .   The  SETL  expression 
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(13)  C  =  {x  G  s  I  f(x)  =  q} 

which  computes  the  set  of  all  values  of  the  set  s  such  that 
the  boolean  valued  subexpression  f (x)  =  q  holds,  is  a  proto- 
typical example.   This  expression  is  continuous  in  s  and  f, 
but  discontinuous  in  q .   If  s  is  varied  slightly  by  s  :-    s+A 
then  C  can  be  updated  by  executing  the  prederivative 

(14)  C  :=  C  +  {x  G  A  I  f(x)  =  q} 

which  represents  a  small  change  in  C.  When  f  is  changed  by 
executing  the  indexed  assignment  fCy^)  '=  z»  then  C  can  be 
updated  by  executing 

(15)  C  :=  if  y-  e  s  then  C  -  (if  f(yQ)  =  q  then  {y  }  else 

nullset)    +  if  z  =  q  then  (yp.)  else  nullset; 

just  before  the  assignment  f  (yr,)  :=  z- 

More  insight  is  gained  by  writing  (15)  as 

(16)  C  :=  C  -  {xe{uGs|u  =  yQ}|f(x)  -  q j+{x6 {uGs | u=yQ } | z  =  q}; 

since  (16)  begins  to  suggest  a  rule  for  updating  more  general 
set-theoretical  expressions  than  (13).   For  example,  before 
changing  f  by  f(yQ)  '•-    z  the  set 

(IV)  C^  =  {x  e  s  I  g(f (x) )  =  q} 

can  be  updated  by  executing 

(18)   C^  :=  C^-  {x  G  {u  e  s  I  u  =  y^}  ]  g(f(x))=  q} 

+  {x  G  {u  G  s  I  u  =  y^}!  g(z)  =  q}; 

Note,  however  that  for  (16)  and  (18)  to  be  'easy'  calculations 
relative  to  (13)  and  (17)  we  require  that  each  computation 
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of  {uG  s  I  u=y^}  occurring  in  this  update  code  must  be 
'easy'.   Of  course,  the  automatic  local  transformation 
turning  {u  e  s  |  u  -   y»}  into  the  equivalent  and   inexpen- 
sive  operations 

(19)  if  Y^   ^    s    then    {y/^}  else    nullset 

applies  to  the  examples  above. 
The  setformer 

(20)  C2  =  {x  e  s  I  f (g(x) )  =  q} 
which  can  be  updated  by  executing 

(21)  C2  :-  C2  -  (x  G  {u  G  s  I  g(u)  =  y^}  ]  f(g(x))  =  q} 

+  {x  G  {u  G  s  I  g(u)  =  y^}  |  z  =  q}; 

can  be  handled  by  a  combination  of  the  transformations 
already  mentioned.   For  C„  to  be  continuous  in  s  and  f 
(relative  to  modifications   s  :=  s  +  A  and  ^(Yq)  '-    z 
occurring  in  a  strongly  connected  region  R)  all  prederi- 
vative  code  and  all  other  attendant  code  necessary  to 
maintain  available  expressions  on  which  the  prederivatives 
depend  must  consist  of  'easy'  calculations  relative  to  C^- 
Since  by  (13)  and  (14)  we  know  that  when  g  and  y^  are 
invariant  in  R,  the  costly  subexpression  {u  g  s  |  g(u)  =  Yq^ 
of  (21)  can  be  made  available  throughout  R  by  inexpensive 
operations,  C   is  seen  to  be  continuous  in  its  parameters 
s  and  f.    All  the  updating  operations  (16),  (18)  and  (21) 
are  to  be  performed  just  prior    to  the  change  f  (Yq)  '•-    ^ 
for  which  they  compensate. 
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Next  consider  the  case  of  a  set-theoretic  expression 
in  whose  defining  condition  f  appears  twice  with  different 
arguments,  as  for  example 

(22)  C3  =  {x  G  s  I  f(g(x))  =  f(h(x))} 
or 

(23)  C^  =  {x  e  s  I  f (f (x) )  =  q}  . 

Before  changing  f  by  ^(Yn)     •=    z,  we  can  update  such  sets 
by  computing  a  set  s.  of  all  those   elements  of  s  over 
which  the  boolean  condition  appearing  in  the  setformer  (22) 
or  (23)  can  change  value  as  a  result  of  the  indexed  assign- 
ment to  f.  Then  after  the  change  to  f  we  can  adjust  the 
value  of  the  setformer  for  those  points  of  s„.   In  the  case 
of   C^  and  C^  this  leads  us  to  the  following  prederivative 
and  postderivative  updating  operations : 

(24)  Sq  :=  {x  e  s  I  g(x)  =  y^  or   h(x)  =  y^}; 

fCVg)  :=  z; 
(Vx  e  Sq) 

if   f (g(x) )  =  f (h(x) )  then 
C-,  :=  C^  +  {xl;  else 
C^  :=  C^  -  {x}; 
endif; 

end    V  ; 

and 

(25)  Sq  :=  {x  e  s  I  f(x)  =  y^  or  X  =  y^j 

f(yQ)  :=  z; 
(Vx  e  Sq) 
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i/  f (f (x) )  =  q    then 

C.     :=  C.  +  {x};  else 

C4  :-  C^  -  {x}; 
endif; 
end    V ; 

In  both  (24)  and  (25)  et.  ^  setformer  s„  must  be 
continuous  in  its  parameters  for  C^  and  C.  to  be  continuous. 
If,  for  example,  (23)  occurs  in  a  strongly  connected  region 
R  and  within  R  all  redefinitions  to  s  and  f  are  slight, 
if  the  free  variable  q  of  (23)  is  invariant  and  each  para- 
meter y  of  an  indexed  assignment  f(y)  :=  z  appearing  in  R 
is  also  invariant,  then  the  auxiliary  set  Sq  can  be  made 
available  within  R  at  small  cost.   Moreover,  although  s. 
depends  continuously  on  f ,  it  is  interesting  to  note  that 
s^  is  not  spoiled  by  f(Yf^)  :=  z  in  either  of  the  code 
sequences  (24)  or  (25);  this  observation  facilitates  our 
method  of  keeping  s„  available  in  R. 

C.    Formal  Differentiation  of  Set  Theoretical  Expressions 
Continuous  in  All  of  Their  Parameters 

We  now  formulate  a  few  general  rules  concerning  the 
formal  differentiation  of  set  theoretical  expressions 
continuous  in  all  of  their  free  parameters.   It  must  be 
observed  that  none  of  the  transformations  which  we  are  study- 
ing can  safely  be  applied  to  expressions  containing  opera- 
tions which  cause  side  effects   for  which  reason  we  shall 
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always  assume  such  operations  to  be  absent  in  the   expres- 
sions we  treat.   We  also  assume  that  typefinding  is  applied 
prior  to  any  attempt  to  optimize  by  formal  differentiation 
so  that  object  types  are  known  during  the  analysis  of  a  program 
for  reduction  (cf.,(Tl)  for  a  method  of  type  analysis  for  SETL)  . 
Consider  the  set-theoretic  expression 

(1)  ■  C  =  {x  e  s  I  K(x)  } 

in  which  K(x)  is  any  boolean-valued  subexpression  contain- 
ing only  free  occurrences  of  the  bound  variable  x,  and 
containing  no  free  instance  of  the  set,  s.   Recall  that 
a  full  calculation  of  (1)  as  performed  by  the  following 
standard  procedure  for  set  formers , 

(2)  C  :=    null  set; 

(Vx  e  s)  /  linear  time  search   / 

if  K(x)  then  C:=C+{x};  /  execute  K(x)  / 
endif; 

end    V  ; 

can  take  0(#s  x  cost(K(x)))  steps.   An  expression  (1)  is 
continuous  with  respect  to  slight  changes  s  :=  s  +  A  to  s 
since  the  prederivatives  of  (1)  with  respect  to  these  changes 
are  also  'easy'  calculations, 

(3)  C  :=  C  +  ^^x  G  A  I  K(x)  }  . 

Suppose  that  the  expression  (1)  is  used  in  a  strongly 
connected  region  R  and  that  the  following  conditions  hold: 

(i)    Inside  R,  s  is  only  changed  by   slight  modifica- 
tions of  the  form   s  :=  s  +  A,  where  A  is  a  small  set  in 
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comparison  with  s. 

(ii)    The  set  valued  variable  s  is  defined  on  entrance 
to  R. 

(iii)   Aside  from  s  all  other  parameters  on  which  (1) 
depends  are  region  constants  of  R. 

Then  we  can  formally   differentiate  the  expression  (1)  in  R. 
If  all  these  conditions  apply,  then  formal  differentiation 
of  (1)  is  accomplished  by  applying  the  following  rule. 

Rule  1.   We  begin  by  making  (1)  available  on  entrance  to  R. 
This  is  done  by  inserting  the  assignment  C  :=  {x  e  s  |  K(x)} 
into  R's  initialization   block.   Then,  at  each  point  p 
inside  R  where  the  value  of  s  changes  by   s  :=  s  +  A,  the 
value  of  C  (which  could  be  spoiled   at  p)  is  updated  by 
inserting  the  prederivative  code  (3) .   All  calculations  (1) 
are  redundant  in  R  and  can  be  replaced  by  uses  of  the 
variable  C. 

We  remark  here  that  the  rule  just  described  is 
complex  enough  to  illustrate  the  difficulties  bound  to  be 
encountered  in  any  serious  effort  to  automatically  guarantee 
improvement  in  running  time  by  formal  differentiation. 
Several  pragmatic  and  simplifying  assumptions  are  implicit 
in  our  expectation  that  formal  differentiation  of  (1)  is 
worthwhile: 

(i)    We  only  consider  the  cost  of  destructive  assignments  to 
the  set  C  as  when  C  undergoes  slight  modifications  within  (2) 
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and  (3);  we  ignore  the  costs  of  making  fresh  copies  of  C  on 

occasions  when  only  nondestructive  assignments  are  possible. 

(ii)   We  pay  no  attention  to  the  cost  of  rehashing  elements 

of  C  when  C  grows  too  large. 

(iii)  We  do  not   figure  the  cost  of  increased  garbage  collec- 
tion which  might  be  required  after  formal  differentiation, 
since  extra  space  is  required  to  store  the  values  of  avail- 
able expressions   possibly  over  large  program  regions. 
If  these  assumptions  fail,  the  actual  effect  on  a  program's 
running  time  of  applying  Rule  1  may  be  undesirable.  (This 
effect  depends  on  such  facts  as  frequency  information  and 
relative  sizes  of  sets,  undecidable  at  compile  time.) 
Moreover,  to  replace  these  assumptions  and  restrictions  by 
others  may  lock  us  into  an  unrealistic  model  of  limited 
utility.   (For  further  discussion  of  this  point  refer  to 
Appendix  B.) 

Rule  1  can  be  used  to  derive  another  more  comprehensive 
rule  for  formal  differentiation  of  expressions  like  (1). 

Suppose  that  the  boolean  subexpression  K  of  (1)  contains 
m  free  occurrences  of  the  n-ary  mapping  symbol  f .   Suppose  , 
also  that  these  m  occurrences  of  f  appear  in  r  different 
terms , 

f  (Pj^^(x)  ,  .  .  .  /P3_j^(x)  )  ,f  (p2-,^  (x)  ,  ,  .  .  ,P2j^(x)  )  ,  .  .  .f  (Pj-i  (x)  .  .  •Pj-n(x)  ) 

where   p^ . (x)  represents  the  j-th  parameter  expression 
(involving  x  which  is  the  bound  variable  of  the  set  former) 
of  the  i-th  term.   Then  as  derivatives  of  (1)  with  respect 
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to  indexed  assignments   f(yw.../Y  )  :==  z  we  can  use 
either  of  the  following  code  sequences 

Relative  Position      Derivative  Code 

(5)    p-2     Sq  :=  {x  e  s  i  pj^^(x)=y^&.  .  .&p^^(x)  =  y^ 

or    . . .       or 

p^i(x)  =  y^&...&  Pj,j^(y)  =  y^}; 

p-1     C  :=   C  -  {x  e  Sq  I  K(x) }; 


f  (y^^,  .  .  .  ,y^)  :=  z; 


p+1     C  :=  C  +  {x  e  s„  I  K(x) }; 


or 


(5')   p-1     Sq  :=  {x  G  s  I  p^j^(x)=y^&.  .  .&p^^(x)  =  y^ 


or    . . .       or 


Pj.3^(x)  =  y^&...&  Pj.n(x)  =  y^}; 
p       f  (y-|_,  .  .  .  ,y^)  :=  z; 

p+1     (Vx  G  s  )  if   K(x)  then    C  :=  C+{x};  else    C  :=  C-{x}; 

endif; 
end    V  ; 

It  is  not  difficult  to  see  that  (5')  has  the  same 
effect  as  (5) .   Moreover,  it  can  be  shown  that  the  validity 
of  (5)   is  a  corollary  of  rule  1.   To  see  this,  consider 
the  set  D^   =  {  [p . ,  (x)  . . . . ,p .  (x) ]  I  x  g  s} .   Let  p .  be 
the  mapping  whose  domain  is  s  and  where  p  .  (x)  =  [p  .  ,  (x) , .  ,  ,  ,p  .  (x)  ] 
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The 


n  for  any  n-tuple  [y, , . . . ,y  ] ,  we  have 


P^  (y^,...,y^)  =  {x  G  s  I  p^-,^(x)  =  y^  &...&  Pj^^(x)  =  y^}. 

If  s  changes  by  deletion   of  p.  iy-,  i  •  •  •  lY   ),    then  D^ 

i 
changes  by  deletion  of  the  n-tuple  [y-|/...,y  ]•   Moreover 

^    -1     "" 
if  s  is  modified  by  deletion  of   u   p.  (y  , ,..,y  ),  then 

i=l   "'" 

the  n-tuple  [yw...,y  ]  is  removed  from  the  domain  of  all 

the  f  terms  occurring  in  (1) .   Next  we  observe  that  if  C 

is  available  on  entrance  to  p   (i.e.,  is  available  just 

prior  to  the  modification   to  f  by  the  indexed  assignment 

r 
f  {y-|^, . .  .  ,y^)  :=  z)  ,  and  if  [y-,  ,  .  •  .  ,Yj^]  ^  u  d^   just 

i=l    i 
before  point  p,  then  the  statement  f(y, ,...,y  )  :-    z  does 

not  change  any  of  the  occurrences  of  f  in  (1) .  Consequently, 

C  is  not  spoiled  by  the  indexed  assignment,  and  it  remains 

available . 

Suppose  now  that  in  expression  (1)  C  is  available  on 

entrance  to  the  program  point  p.   Then  we  would  proceed  as 

follows:   (1)  at  p-3,  put  s   equal  to  the  set 

^    -1 

'-'  P^  (yw---/y  ),     (2)  at  p- 1  delete  s_  from  s;  (3)  at  p- 2 
i=l   1    -L       n  U 

update  C  in  accordance  with  rule  1,  (4)  at  p+2  add  s_  back 
to  s;  and  (5)  at  p+1,  use  rule  1  again  to  update  C.  This 
would  give  us  the  following  code: 
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p-3     Sq  :=  {x  e  s  I  p-^^(x)=Y^&..  .&p^^(x)  =  y^ 


or    ...  or 


Prl^^^  =  y^  &...&  Prn^^^  "  ^n^ 


p-2     C   :=  C  -  {x  e  s   i  K(x)  } 


(6)  p-1     s   :=  s  -  s 

p       f  (y.,^/  •  •  •  fY^)     •'=    z 

p+1     C   :=  C  +  {x  e  Sq  I  K(x)  } 


p+2     s   :=  s  +  s _ 


In  this  code  C  is  not  spoiled  by  the  statement 
f(y,,...,y  )  :=  z.   Hence,  if  C  is  available  upon  entrance 
to  p-3,  then  by  rule  1  we  know  that  C  remains  available 
on  exit  from  p+2.   And  now  finally,  since  in  (6)  the  value 
of  the  set  s  is  the  same  before  p-1  as  after  p+2,  and 
because  s  is  not  used  between  p-1  and  p+2,  the  code  (6) 
is  equivalent  to  that  shown  in  (5).   The  assumption  that 
at  least  one  of  the  parameters  in  each  f  term  in  K  involves 
x  (the  bound  variable  of  the  set  former)  will  usually  cause 
the  set   s„  to  be  small  in  comparison  with  s. 

To  show  that  (1)  is  continuous  relative  to  differential 
modifications  of  the  form  s  :=  s  +  A   (where  A  is  small  in 
comparison  with  s)  and   also  relative  to  indexed  assignments 
f(y-|'--'/y  )  :=  z   occurring  in  a  strongly  connected  region  R, 
we  want  to  insure  that  the  pre  and  post  derivative  code  in  (5) 
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and  (5')  can  be  made  to  consist  of  'easy'  calculations.  For 
this  to  be  the  case,  the  setformer 


(7)   Sq  =  {x  e  s|pj^^{x)  =  Y^    &...&  P^n^^^  "^  ^n 

or    ...  or 


Pj,l(x)  =   Y^      &...&  Pj.j^(x)  =  y^} 


occurring  at  points  p-2  of  (5)  and  p-1  of  (5')   must  be 
profitably  reducible;  and  this  will  be  true  if  (7)  is 
continuous  in  s  and  f.  By  Rule  1  we  know  that  (7)  is  contin- 
uous in  s .   We  shall  see  that  (7)  is  also  continuous  with 
respect  to  indexed  assignments   f (w, , . . . ,w  )  :-   v  occurring 
in  R  if  for  every   such  assignment  each  parameter  expression 

w. ,...,w   is  a  region  constant  of  R. 
in        ^ 

Before  exhibiting  actual  inexpensive  update  code 
supporting  this  claim,  it  is  useful  to  look  at  the  situa- 
tion from  a  somewhat  different  point  of  view.   Let  d  be  the 
maximum  depth  of  nesting  of  f  terms  contained  within  other 
f  terms  in  the  boolean  subpart  K  of  (1)  (e.g.,  the  depth  of 
the  term   f (g (f (x  +  f  (0) ) ) )   is  3).   Then  (7)  has  a  maximum 
nesting  depth  of  d-1.   We  will  show  inductively  that  for   • 
d  =  1,2,...   (7)  is  continuous  in  f.   For  d  =  1  (7)  contains 
no  f  terms  and  is  trivially  continuous.   If  d  =  2  is  the 
nesting  of  (1)   then  the  depth  of  (7)  is  1,  and  (7)  can  be 
reduced  economically  using  either  rules  (5)  or  (5')  and 
Rule  1.   Next  assume  that  any  expression  (7)  which  has 
depth  d-1  less  than  or  equal  to  k-1  is  continuous  in  f, 
and  consider  the  case  d  ==  k  +  1 .   Since  the  depth  of  (7)  is  d, 
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application  of  rules  (5)  or  (5')  to  (7)  will  produce  an 
expression  s'  which  is  of  the  same  form  as  (7)  but  with  a 
depth  d-1.   By  hypothesis,  s'  is  continuous  in  f.  Thus,  we 
can  conclude  that  (1)  is  continuous  in  f  and  s  for  any 
depth  d. 

The  previous  remarks  suggest  that  the  derivative  of 
(1)  with  respect  to  f  should  be  realized  by  first  applying 
rules  (5)  or  (5')  to  (1);  this  gives  rise  to  an  expression 
s,  of  the  same  form  as  (7).   Then  either  (5)  or  (5')  can 
be  applied  to  s^  and  to  each  successive  s.  ,  j  =  2,..., d-1 
emerging  by  use  of  (5)  or  (5')  until  the  d-l'st  derivative 
is  applied.   The  final  expression  s,_-|  produced  by  this 
process  will  have  zero  depth  and  can  be  reduced  by  Rule  1. 

This  approach  will  often  be  feasible,   but  in  general 
it  is  not  easy  to  say  whether  the  d  different  auxiliary  sets 
which  must  be  kept  available  in  R  as  a  result  of  the  trans- 
formations sketched  above  will  overlap  strongly  and,  hence, 
require  excessive  space.   Nor  in  general  can  we  say  how  much 
of  an  improvement  in  speed  (if  any)  is  gained  by  maintaining 
all  these  sets  in  addition  to  (1)  by  incremental  calculations, 

Fortunately  we  can  suggest  a  much  more  attractive 
transformation  which  introduces  fewer  auxiliary  calculations. 
Suppose  that  the  expression  (1)  is  used  in  a  strongly 
connected  region  R  and  that  the  following  conditions  hold: 

(i)  The  boolean  valued  subexpression  K(x)  contains  m 

free  occurrences  of  an  n-ary  mapping  f  (in  which  each  such 

occurrence  has  at  least  1  parameter  expression    involving  x, 
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the  bound  variable  of  the  set  former) ;  all  other  free 
variables  occurring  in  K  are  loop  invariant. 

(ii)  The  m  occurrences  of  f  in  K  begin  r  distinguishable 
f  terms,  f  (p^^-j^  (x)  ,...  ,p^^  (x)  ),...,  f  (p^^  (x)  ,..,  ,p^^  (x)  )  . 

(iii)  Inside  R,  s  is  only  changed  by  slight  modifica- 
tions of  the  form  s  :=  s  +  A,  where  A  is  a  small  set  in 
comparison  with  s,  and  f  is  only  changed  by  indexed  assign- 
ments of  the  form 

(8)  f  (yj^/y2'  •  •  •  'Y^)     ■=    z  • 

Then  we  can  formally  differentiate  the  expression  (1)  in 
the  region  R.   The  differentiation  rule  is  as  follows: 

Rule  2.   (There  are  two  cases  to  consider.) 

Case  1.   Consider  the  class  of  expressions  (1)   in 
which  for  i  =  l,...,r  and  j  =  l,...,n   each  parameter  expres- 
sion p.  .  (x)   either  does  not  involve  x  or  does  involve  x 
and  can  be  symbolically  transformed  into  a  linear  factor 
of  the  forms   x*a+b.    For  any  expression  (1)  in  this  class 
(e.g.,  expressions  (13)  and  (17)  of  section  B) ,  we  can  use 
an  inexpensive  variant  of  (5)  to  update  (1)  with  respect 
to  indexed  assignments  (8),  even  when  the  parameters  y, , . . . ,y 
appearing  in  (8)  are  not  region  constants.   This  variant 
of  (5)  is  obtained  if  we  simplify  the  setformer  (7)  used 
at  point  p-2  of  (5)  into  the  following  efficient  form, 
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n-1 
(9)   if  q,  G  s   &   t,  .  =  t,  .  then  {q.,  }  else    nullset 
-*-      j  =  l   ^     ^ 
+  .  .  .   + 

n-1 
ifq   Ss   &   t.=t.  then  {q  }  else    nullset. 
j  =  l 

in  which  q .  ,  t .  .  ,  and  t .  .  for  i  =  1 , . . . , r  and  j  =  1 ,  .  .  .  ,  n 

are  meta  symbols  denoting  computed  expressions. 

We  derive  (9)  from  (7)  by  the  following  straightforward 

manipulation.   Expand  (7)  into  a  union  of  setformers 

n  n 

{x  e  s  I   &   (Pt  .(x)  =  y.)}  +  .  .  .  +  {x  e  s  |   &   (p^^  (x)  -  y.)} 

j  =  l    ^^  ^  j  =  l    ""^        ^ 

By  assumption  we  know  that  within  each  set  former, 

n 
ixSs|    &   (p..(x)=y.)},  i=l,...,r   there  must  be 

j  =  l    ^^        => 
a  parameter  expression,  say  p.,(x),  which  involves  x.  Hence, 

we  can  transform  the  equality  p.,(x)  =  y,  into  the  form  x  =  q^ 

n 
and  rewrite  each  conjunction   &   (p. . (x)  =  y.)  as 

n  j=l    ^^        ^ 

X  =  q.   &   (p. . (q.)  =  y.)   which  involves  only  one  occurrence 

of  X.   If  we  now  make  the  substitution  t.  ._-,  for  Pj^-;  ("^-i  ) 

and  t! .  ,  for  y.  ,  i  =  l,...,r,  j  =  2,...,n,  each  resulting 
ID  -L       D  n-1 

setformer  {xes|x=q.   &   (t..=t..)}  may  be  simplified 

1  -j  =  i    ID     ID 

to  the  following  conditional  expression, 
n-1 

D^ 
Case  2.   To  reduce  more  general  expressions  whose  depth 


^f  q.e  s  &  (t. .  =  t. .)  then  {q. }  else    nullset. 
1  =  1 


of  nesting  in  f  terms  is  greater  than  1  (e.g.,  (20),  (22), 
and  (23)  of  the  last  section) ,  we  must  use  a  different  method. 
We  will  at  first  consider  only  situations  in  which  each 
parameter  y-,/...,y   of  indexed  assignments  f  (y^^ ,  .  .  .  ,y^)  :=  z 
occurring  in  R  is  a  region  constant  of  R.   In  the  next  section, 
however,  this  restriction  will  be  cast  off. 
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Suppose  that  R  contains  t  different  indexed  assignments 
to  f  which  we  denote  by  f  (y -,-./..■  ,y  j^]^)  :=  z,  ,  .  .  .  , 

f(y   ,...,y   )  :-    z    .      Then  on  entrance  to  R,  we  must 
insert  the  following  initializing  code, 

(10)  C     :=  {x  S  s|k(x)};  /*  expression  (1)  */ 

(^)  r    n 

s^  '  :=  {x  e  s  I  or    (  &  p.  .  (x)  =  y  •  -,  )  1 ;  /*  set  former  (7)  */ 

"  i=l   j=l  ^^        ^ 

(2)  r     n 

s     :=  {x  e  s|  or  (  &  P- -(x)  =  Y^o)^; 

^  i=l   j=l  ^^        ^ 

(t)    '        r    n 

si         :=  {x  £  s|  or  (  &  p^  ^  (x)  =  ym.)K"  /*  based  on  */ 


or    {    Si      p.  .  (x)  =  y-.  )  i ;  /*  based  on  *, 
i=l   1  =  1  ^^       ^t 


/*  f(yit"--ynt^-^t  */ 


Whenever  s  is  modified  in  R,  we  then  apply  Rule  1  to  update 
C,  s-   ,s»  ,...,s^      .      At  each  program  point  in  R  at  which 
f  is  changed,  we  keep  C,  s„  i^n      /•••»Sj,    available  by 
executing  either  of  the  following  code  sequences,  which  are 
based  on  (5)  and  (5'). 


(11)  s^^^  :=  s^^^  -  (x  e  s^^^  I  or    (  &   p .  .  (x)  =  y,-,)); 
^  u  ^    i=l   j=i   13        3^ 

Sq     .    Sq        tX     Sq 


or  (  &   p.  .  (x)  =  y  .  )  } ; 
i=l   j=l   ^J        ^^ 


C      :=  C  -  {x  e  S(5^^  |k(x)  }; 

f(y, p,...,y  „)  :=  z  ;   /*  all  s„  sets  are  updated 

XX.       nx-  U           f  0  \ 

(0)  except  Sf.    */ 

C     :=  C  +  {x  e  s^^  |k(x)  };  " 

Sq  •   .     Sq  +     IX    fc     Sq 


or  (  &   p.  .  (x)  =  y  ., )  }; 
i=l   j  =  l    ->  ^ 


(t)   *   (t)  f  n  ,  ^    n 

^0    '-^    ^0    +  ^^  ^  s^^^  I  or((  &   p   (X)  =  Y.^)}> 

^         i=l   j  =  l   ^^        -' 
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or 

(Vx  e  s^^^ 

if   K(x)  then   C  :=  C  +  {x};  else    C  :=  C-{x};  endif; 

if   Ir       (  &   p..(x)  =  y.^)  then    s^^^  :=  s^^^-{x}; 
i=l   j=i  13       Jl        u       u 

eZse  Sj^    :=  s^^   +{x};  endif; 

\    /*  all  sets  s.  except  s^    are  updated  */ 
'   r     n 

if   or       (  &   p..(x)  =  y   )  then    s^^^  :=  s^^^-{x}; 
i=l    i  =  l   ^3        J>-        u       u 

eZ-se  s„    :=  s„  +{x) ;  endzf; 
end   V; 

This  keeps  C   available   throughout  R,  so  that  all  calcula- 
tions (1)  can  be  replaced  by  uses  of  C. 

It  is  easy  to  see  that  (11')  computes  the  same  thing 
as  (11)  .   To  justify  the  code  sequence  (11)  ,  we  need  to 

(£) 

substantiate  two  claims:   (1)  s^.    cannot  be  spoiled  by 

assignments  f  (y,  „  ,  .  .  .  ,y  „ )  :=  z  „ ;  and  (2)  for  any  m,  1  f_  m  ^t 

and  m  7^  £,  the  derivative  code  shown  in  (11)  for  s^ 

is  correct. 

To  justify  claim  (1) ,  we  consider  the  predicates 
n 
Q.(x)=   &   p..(x)=y..,i=l,...,r.   Letwbean  element 

(£,)   3~-^ 

of  s'    just  prior  to  the  program  point  p  at  which 
f(yiji'---/y  j)  ==  2„  occurs.   Then  for  some  k,  1  f.  k  <_  r, 
Q,  (w)  must  hold  before  p.   Moreover,  we  can  choose  k  in 
such  a  way  that  among  those  predicates  Q, (w) , . . . ,Q  (w) 
that  hold  before  p,  Q,  (w)  has  a  minimal  depth  d  of  nesting 
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in  f  terms.   Consequently,  Qj^(w)  must  hold  after  p  and  w 
must  be  an  element  of  Sq    after  p.   For  otherwise,  an  f 
term  occurring  in  Q,  (w)  would  be  spoiled  at  p.   And  this 
implies  that  a  predxcate  Q^(w),  v  7^  k ,  with  a  smaller 
nesting  depth  than  Q,  (w)  would  hold  just  before  p  —  a 
contradiction. 

In  proving  claim  (2)  it  is  useful  to   organize  the 
r  different  f  terms  occurring  in  (1)  as  follows:   Let 
f  (p,  T  (x)  ,  .  .  .  ,p,   (x)  )  ,  k  =  1,  .  .  .  ,q   be  all  those  f  terms 

K  J-  riTi 

which  never  occur  within  any  of  the  f  terms  of(l).   (These 

(m) 
are  all  the  f  terms  which  also  have  no  occurrences  in  s^   , 


m 


=  l,...,t.)   Then  application  of  the  transformation  (5] 


to  update  s^"^^  ,  1  1  rn  ^  t  and  m  7^  £  in  a  manner  compensating 
for  the  change  f(y  ,,..., y   )   :=  z  to  f  results  in  the 
following  code: 


n 
s 

=q+i   j 

(m)  --  .i^)    .  {X  e  s  I  Ir    (    &   p   (x 

i=l   i  =  l    -^ 


.    :=  {x  e  s|  or       (  &   P--(x)  =  y.  ) } ; 
"  i=q+i   i  =  l    ->  -' 

Sq-'  :=  Sq--  -  ix  t  s^lor    (8.      p.  .  ix,  =   y.J}; 


f  (y-,^^, . . .  ,y^^)  :  z^; 

-0"^  :=  s(-)  +  i.e    SqI  L  (^   p   (x)  =    y   )}. 

1=1   j  =  l    -^ 

But  since  s„  c  Sq   ,  it  follows  from  the  proof  of  (5) 
{cf.  discussion  of  (6))   that  we  can  replace  occurrences 
of  s„  by  s)    in  the  code  just  above. 
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The  code  generated  by  rule  2   can  be  improved  by 

eliminating  redundancies  in  the  expression  {x  e  s-|K(x)} 

which  appears  at  locations  p-1  and  p+1  of  (5).   Suppose 

r 

we  know  that  s„  =   u   R.  ,  where  R, ,...,R   are  disjoint 
0    .  _i   1  1      r        -■ 

sets.   Then  {x  e  s„|K(x)}  can  be  rewritten  as 

r 

u   {x  6  R.|K(x)}.   Suppose  also  that  in  each  set 
i=l       ^ 
{x  s  R. |k(x)},  K{x)  can  be  transformed  (by  elimination  of 

redundant  operations)  into  an  equivalent  but  easier  to 

evaluate  expression  K. (x) .   Then  it  may  be  worthwhile  to 

work  with  the  partition  {R.}  of  s„  instead  of  s„   and 

to  rewrite 

r 
{x  e  s  |k(x)}     as      u   {x  e  R^|k^(x)}. 

i=l 

As  an  example  of  this,  observe  that  if  we  let 

i-1 
R  =  {x  e  (s  -   u   R^)  |p^^(x)  =  y^  &...&  P^^U)    =   Y^} , 

k=0 
where  R-  -   0,  then  R, ,...,R   form  a  partition  of  s^. 

Moreover,  on  the  set  R.  we  can  replace  the  term 

f (p .  ,  (x)  , . . . ,p .  (x) )   which  appears  in  the  expression  K  at 

location  p-1  of  the  code  generated  by  rule  2  by  f(y,,...,y^) 

(cf .  (5)  above) .   This  can  lead   to  a  version  of  line  p-1 

of  (5)  which  is  relatively  easy  to  evaluate,  and  it  is 

therefore  tempting  to  apply  the  same  transformation  to   line 

p+1  of  (5).   However,  at  location  p+1  we  cannot,  even  after 

r 
breaking  up  {x  e  s-|K(x)}  into  u   (xS  R.|k(x)},  simply 

^  i=l 

replace  each  term  f (p . , (x) , . . . ,p .  (x) )  in  K  by  z.  This  is 

because  the  indexed  assignment  appearing  in  line  p  of  (5) 

changes  f  and  may  therefore  cause  some  parameter  p. . (x) 
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appearing  in  {x  ^  R. |K(x)}  within  (5)  and  containing  an 

occurrence  of  f  to  have  a  value  different  from  y . .   When 

dealing  with  cases  complicated  enough  for  this  problem  to 

arise,  we  can  make  use  of  a  second,  finer,  partition 

R,,..-,R  of  s„  defined  as  follows:   First  set  R„  :-   0  as 
1      r     0  0 

before.   Next  find  all  f  terms  f.  ,...,f.    whose  parameter 
expressions  involve  no  f  term,  and  put 


R^  :=  {x  G  s  I  p^  -|^(x)  =  y^  &...&  p^  ^(x)  -  y^}  , 

R2  :=  {x  t  (s-Rp  |p^  -|^(x)  =  y^  &...&  p^  ^(x)  =  y^}  ,  ..., 

rg-l 
R^  :=  {x  e  (s-  U   Rj^)  |p^  ^{x)    =  y   &...&  p^   ^(x)  =  y^}. 
0  k=0        rQ  rQ 


After  this,  find  all  f  terms  f.     ,    f-  ,...,f.    which 

rg+1     rQ+2        r^ 
do  not  belong  to  the  set  F,  ={f.  ,...,f.  }   but  whose 

parameter  expressions  only  contain  f  terms  which  do  belong 

to  F, .   Define  sets  R   ,,  ,...,R   by  writing 
1  -"^n         r,   -^        ^ 

R   :=  {x  e  (s-  u   r') Ip    (x)  =  y   &...&  p    (x)  =  y  }. 

Iterating  this  procedure  sufficiently  often  we  will  obtain 

a  partition  {R, ,...,R  }  which  can  be  used  to  eliminate 

redundant  calculations  of  f (p . , (x) , . . . ,p .  (x) )  at  both  p-1 

and  p+1.   More  specifically,  if  we  let  K(x)  [t, ,...t  ] 

denote  the  result  of  substituting  the  terms  t, ,...,t   for 

^  In 

the  terms  s,,...,s   occurring  in  K(x),  we  can  replace  the 
code  occurring  at  location   p-1  (in  rule  2)  by 
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r 
u 

1=1  '^il        '^in 


,_,        i'    'p.  ,  (x)  ,  .  .  .  ,p   (x)  '-^1'    '-^n-* 


and  the  code  occurring  at  p+1  by 

r 
C:=C+   u   {xeR'|(K(x),,    ,,  ,,,IZ]    ,,        ,, 

(Note  that  the  immediately  preceding  formula   describes  two 
successive  steps  of  substitution.) 

This  general  method  allows  the   code  used  to  reduce 
various  set  former  expressions  in  examples  (20) -(25)  of 
Section  B   above  to  be  generated  automatically. 

As  an  example  of  the  redundancy  elimination  method 
just  outlined,  consider  the  following  expression 

(12)   C  =  {x  e  s|f(f(f(x+l)  +  D)  =  f(f(x+l)  +  1)}. 

Suppose  that  the  mapping  f  is  changed  slightly  by  an  indexed 
assignment,  f (y^)  =  Z   which  occurs  at  a  program  point  p. 
Then  to  update  the  value  of  (12)  we  proceed  as  follows. 
First  a  partition  R, ,R  ,R-   is  computed.   Observe  that  this 
partition   contains  three  sets  because  only  three  different 
f  terms  occur  in  the  boolean  subexpression  in  (12) :  these 
are  f(x  +  1),  f(f(x+l)  +  1),  and  f(f(f(x+l)  +  1)).  Since 
f(x+l)  is  the  only  f  term  of  (12)  whose  parameter  expression 
involves  no  f  term,  we  put  R,  :=  {x  es| (x+1)  =  Yq}- 
Since  the  parameter  part  of  f(f(x+l)  +  1)  involves  f(x+l), 
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we  set 

R^  :-  {x  e  (s-Rj^)  I  f  (x+1)  +  1  =  Yg  }  and 

R3  =  {x  e  (s-(R^+R2)) |f (f (x+1)  +  1)  =  Yq}. 

The  code  generated  to  update  (12)  is  then  as  follows 


(13)   R^  :=  {x  e  s|x  +  1  =  Yq); 


^2 


{x  e    (s-R^) |f (x+1)  +  1  =  Yq}; 


R3  :-  {x  G  (s-(R^+R2)) |f(f(x+l)  +  1)  =  Yq); 


:=    C 


-  {x  e  R^|f(f(f(YQ)  +  D)  =  f(f(Yo)  +  1)> 

-  {x  e  R2|f (f (Yq))  =  f (Yq^^ 

-  {x  e  R3|f  (Yq)  =  Yoh 
f(Yo)  :=  Z; 

C   :=  C  +  {x  e  R^|f(f(Z+l))  =  f(Z+l)} 

+  {x  e  R2|f(Z)  =  Z}  +  {x  e  R3IZ  =  y^}; 

We  note  that  the  set  former  expressions  defining  R, ,  R^ 
and  R3  in  (13)  are  continuous  in  all  parameters  with  the 
exception  of  y       (which  we  temporarilY  require  to  be  a 
region  constant) ,  so  that  theY  can    be  made  available  in 
R  economicallY  using  techniques  alreadY  described. 

As  noted  by  Earley,  the  method  of  formal  differentia- 
tion which  has  been  described  can  be  extended   in  a  useful 
way  to  apply  to  various  SETL  expressions  that  implicitly 
contain  set  formers.   Among  these  are  the  forall  iterator 
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(i.e.,  (Vx  G  s|k(x))  block),    the  existential  and  universal 
quantifiers  (i.e.,  3x  g  s|k(x)  and  Vx  e  s|K(x)),   and  the 
compound  operator  (i.e., 

[<binop>:  x  e  s  ]  K(x)J  e(x)   ). 

To  formally  differentiate  these  expressions,  we  rewrite  them 
by  replacing  the  implicit  set  former  subpart,  x  G  s|K(x), 
which  they  contain  with  x  e  {u  G  s|K(u)}.  The  set  former 
subexpressions  thus  exposed  can  then  be  differentiated 
using  rules  1  and  2. 

Let  us  now  consider  more  closely  the  SETL  compound 
operation 

(14)  C]_  ^    [binop:    x  G  c]  e(x)  , 

an  illustrative  example  of  which  [+:  x  G  c]  e(x)  calculates 

the  value   J   e(x).   In  general,  [binop:    x  €  C]  e(x)  means 

xGC 

e(x,)  binop .. .binop   e(x  )  where  C  -  {x,,...,x  }.   For  the 
1      "^        '^     n  in 

general  case  in  which  the  binary  operation  binop   has  an 
appropriate  inverse,  inverse    binop    (e.g.  arithmetic  binary 
+  with  -  as  its  inverse) ,  we  note  that  (14)  is  continuous 
relative  to  slight  changes  in  C;  i.e.,  before  an  occurrence 
of  the  code  C  :=  C  +  A,  C,  can  be  updated  by  an  appropriate 
inexpensive  change,  either 

(15)  C,  :=  C,  binoplbinop:    x  G  (A  -  C) ]  e (x) ; 
or 

(15')   C   :=  C-,  inverse    binop     [binop:       x  (A*C)  ]  e(x); 
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Applying  the  heuristic  rule  'continuous  functions  of  contin- 
uous functions  are  continuous'  to  C,  of  (15)  and  C  of  (1) 
yields  update  identities  for  a  more  general  compound 
operation  form 

(16)  C  =  [binop:    x  G  s  |  K(x)]  e (x)  . 

In  order  to  formally  differentiate  the  expression  (16)  in  a 
strongly  connected  region  R,  we  require  all  the  conditions 
imposed  on  (1)  to  hold,  and  also  require  that  neither  the 
set  s  nor  the  n-ary  mapping  symbol  f  occurring  in  K  should 
appear  in  the  subexpression  e  of  (16) .   If  all  these 
conditions  are  met,  we  differentiate  (16)  by  first  making 
it  available  on  entrance  to  R.   This  is  accomplished  by 
inserting  the  assignment  (16)  into  R's  initialization  block. 
Next,  within  R  at  each  point  p  where  C  can  be  spoiled  by 
'slight'  modifications  to  the  variables  s  or  f,  we  can  apply 
the  following  continuity  rules  for  (16)  that  parallel  rules 
1  and  2: 

Rule  3:   where  s  is  modified  in  R  by  the  code  s  =  s  +  A, 
the  value  of  (16)  can  be   maintained  in  C  by  executing 

(17)  C    :=   C   binop  [bin  op:    x  e  (A  -  s)  |  K(x)J  e(x) 
or 

(17')  C  :=  C  inverse    binop  [binop i    x  e  A  *  s|k(x)]  e (x) 

respectively. 

A  similar  rule  analogous  to  rule  2  can  be  stated  to 
cover  the  case  of  changes  to  f. 
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Rule  4.    Suppose  that  the  Boolean  expression  K(x)  of  (1) 
contains  m  free  occurrences  of  the  n-ary  mapping  f.   Use 
the  same  notation  and  enabling  conditions  explained  in 
connection  with  Rule  2.   Then  at  each  point  p  in  the 
region  R  at  which  f  is  changed  by  an  indexed  assignment, 
the  following  code  transformation  should  also  be  made: 

Relative  Position   Derivative  Code 

p-2        Sq  :=  {xes|p^j^(x)  =  y^^  &...&  P^n^^^"  ^n 

or    ...  or 

p  ,  (x)  =  y,  &...&  p   (x)  =  y  }; 
•^rl      -^1       '^rn      -^n 

p-1        C  :=  C  inverse    binop    [binop :x   G(s«*s) | 

K(x)]  e(x) 
p  f  (y^,  . . . ,y^)  :=  Z; 

p+1        C  :=  C<binop> [<binop> :  xe (s  -s) | K (x) ] e (x) 

It  is  easily  seen  that  Rule  4  follows  from  Rule  3  in  much 
the  same  way  that  Rule  2  follows  from  Rule  1. 

These  rules  imply  continuity  properties  for   many  other 
high  level  SETL  operations.  The   counting  operation  applied 
to  a  set  former;  i.e.,  #{x  g  s  |  K(x)}  can  be  treated  as 
[+:  x  G  s  I  K(x)]  1.   When  side   effects  of  the  existential 
and  universal  quantifiers  can  be  ignored,  then  the  corres- 
ponding SETL  forms  3x  G  s  |  K(x)  and  Vx  G  s  |  K(x)  can  be 
rewritten  as  [+:  x  g  s  |  K(x)  ]  1  n=  0  and  [+:  x  G  s  |  nK(x)]  1 
=  0  respectively.   Set  inclusion  (the  predicate  R  3  S) 
is  continuous  in  both  S  and  R  since  in  SETL,  R  incs    S   can 
be  handled  as  [+:  x  G  s  |  x  ^  R]  1  =  0 . 
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Although  iterative  operations  formed  using  range 
iterators,  e.g.,  LO  £  n  <  Hl|K(n)   can  be  differentiated 
using  the  techniques  already  mentioned,  range  iterators 
are  frequently  amenable  to  other  reduction  methods  discussed 
by  Schwartz  [Schb].   Consider  as  an  example  the  following 
existential  quantifier, 

(18)  3  LO  <^  n  <  HI  |  K(n) 

where  K  does   not  depend  on  the  integer  valued  variables 
LO  or  HI.   In  this  case,  rather  than  differentiating  the 
full  expression  (18)  we  can  reduce  one  or  both  range 
boundaries  so  as  to  limit  the  size  of  the  search  which 
implements  (18). 

In  particular,  if  we  want  to  reduce  the  lower  boundary 
of  (18),  we  can  rewrite  (18)  as 

(19)  [min:    LO  <  n  <  HI  |  K(n)]n  n=  ^ 

allowing  the  compound  min    operation  to  be  differentiated. 
The  prederivative  of  C  =  [min:  LO  £  n  <  Hl|K(n)]n  with 
respect  to  LO  :=  LO  -  A   is  described  by  the  following 
update  code , 

(20)  if  A  >  0  then 

IF  3  LO  -  A  ^  n  <  LO|K(n)  then  C  :=  n; 
END IF; 
else  C  :=  [min:  LO-A  ^  n  <  Hl|K(n)]n; 
ENDIF; 
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If  an  n-ary  map  f  occurs  in  r  distinguishable  terms  of  K 
and  each  such  term  depends  on  n,  we  can  apply  a  rule 
analogous  to  Rule  2  to  differentiate  C  with  respect  to 
indexed  assignments,  f(y, ,...,y  )  :=  z.   The  update  code 

is 

r  n 

(21)    T  :-  [LO  <  n  <  C|  or  {    &      P-  -(n)  -  y^)]; 

i-1  j-1   ^        ^ 
f  iYl'  ' ' ' >Y^)     :=  Z; 

if    3n  e  T  |  K{n)  then   C  :=  n; 
eZ.se  if  ~\K{C)  then      C  :=  [min:    C+1  <_  n  <  HI  |k  (n)  ]  n; 

ENDIF; 

It  is  not  difficult  to  see  that  the  technique  described 
above  can  also  be  applied  to  other  SETL  operations  such  as 
universal  quantifiers   V  LO  £  n  <  HI | K (n) ,   tuple  formers 

[LO  ^  n  <  Hl|K(n)J,   and  forall  iterators 

(V  LO  <_  n  <  Hl|K(n))  bioak,      all  these  operations  depending 
on  range  iterators . 

D.    Differentiation    of  Expressions  Containing  Parameters 
on  which  They  Depend  Discontinuously . 

Most  SETL  expressions  are  not  continuous  in  all  the 
parameters  on  which  they  depend.  For  example,  the  set 
former 

(1)  C  =  {x  e  s  I  f(x,q)  >  q} 

is  continuous  in  the  set  s  and  the  mapping  f,  but  it  is 
discontinuous  relative  to  changes  in  the  free  variable  q. 
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Suppose  that  the  expression  (1)  occurs  in  a  strongly 
connected  region  R,  and  suppose  also  that  all  changes  to  s 
and  f  are  slight  within  R.   In  this  situation  the  diffi- 
culties caused  by  the  discontinuous  dependence  of  (1)  on  q 
can  be  overcome,  and   the  computation  (1)  can  be  moved  out 
of  R  by  applying  the  general  formal  differentiation  scheme 
sketched  in  the  introduction  to  the  present  thesis.  That  is, 
we  can  perform  the  following  steps: 

i.    Define  an  initial  mapping  C  :=  nullset    on  entrance  to  R. 
ii.   Replace  all  computations  (1)  in  R  by  the  expression 
if  q  ^    BOM   C  then    C(q)  else    C(q)  :=  {x  e  s|f(x,q)  >  q} 
which  either  retrieves  the  value  of  a  stored  calculation 
of  (1)  from  C  if  such  a  value  exists  or  else  computes  (1) 
and  records  this  value  into  the  memo  function  C (q)  for 
possible  future  use. 

iii.  Whenever  differential  changes  to  s  or  f  occur  in  R, 
modify  each  stored  set  C{q)  according  to  rules  1  and  2 
(cf.  section  C)  for  all  values   q  G  DOM   C;  i.e.,  execute 
the  following  prederivative  code  just  before  s  :=  s  +  A: 

(2)    (Vq  G  DOM   C)  C (q)  :=  C(q)  +  {x  e  A|f(x,q)  >  q}; 

The  basic  pre  and  postderivatives  which  keep  C  available 
on  exit   from  indexed  assignments  f{x^,x_)  :=  Z  are 
as  follows: 
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(3)    (Vq  e  DOM   C)  SQ(q)  :=  (x  e  s|x  -  x^  &  q  -  X2}; 


C(q)  :=  C(q)  -  {x  e  SQ(q)|f(x,q)  >  q}  ; 


end    V  ; 


f(x,,X2)  :=  Z 


(Vq  G  DOM    C)  C(q)    :-  C(q)  +  {x  £  Sq (q) |f (x,q)  >  q} 
end    V  ; 


The  case  (1)  deserves  special  treatment,  since  the  sets 
s_ (q)  appearing  in  (3)  can  be  calculated  by  (9)  of  Section (C); 
i.e.,   Sp.(q)  :=  if  X,  G  s  &  q  =  x^  then    {x,  }  else    nultset 
where  x,  and  x^  need  not  be  region  constants.   As  a  result, 
(3)  can  be  transformed  into  a  speedier  equivalent  version, 

(3')  i/  X,  G  s  &  X   G  DOM   C  &  f(x  ,X2)  >  X2  then 


C{x^)     :=    C(x^)     -    {x^}; 


endif; 

f(x^,X2)  :=  Z; 

if  x,  G  s  &  x^  e  DOM   C  &  Z  >  x_  then 
•^12  2 

C(X2)  :=  C(X2)  +  ix^}', 
endif; 

XV.      For  changes  of  q  in  R  nothing  more  is  needed. 

The  approach  described  by  (i)-(iv)  above  will  be 
profitable  when  the  execution  frequency   of  the  code 
inserted  by  (ii)  is  great  enough  and  when  the  maximum 
number  #C  of  calculations  that  need  to  be  stored  in  C 
is  small  enough.   But  if  #C  is  large   three  major  objec- 
tions which  can  easily  make  this  approach  infeasible  arise; 
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(a)  storage  of  all  the  sets  C(q)  may  be  too  expensive; 

(b)  updating  all  the  sets  C(q)  when  a  parameter  upon 
which  C  depends  continuously  is  modified   may  waste 
more  time  than  is  saved  by  avoiding  the  calculation 
of  C; 

(c)  storage  of  the  domain  DOM   C  of  C  may  be  too  expensive. 

Example  (1)  illustrates  these  three  potential  objec- 
tions.  For  a  'randomly'  chosen  f  some  large  percentage 
of  all  the   x  e  s   will  belong  to  the  set  (1)  for  many  q. 
Hence,  the  sets  C(q)  will  be  large  for  many  q.   These  sets 
will  often  overlap  and  when  #C  is  large   storing  them  will 
undoubtedly  require  much  more  space  than  is  required  by  s. 
It  is  not  difficult  to  surmise  that  in  those  contexts 
where  objection  (a)  arises   objection  (b)  will  also  cause 
trouble.   Although  the  update  code  (3')  and  the  computa- 
tions introduced  in  step  ii   above  are  inexpensive,  the 
work  involved  in  computing  the  derivative  (2)  is  directly 
proportional  to  #C  and  can  exceed  the  cost  of  the  original 
calculation  (1)  for  large  #C.   The  third  objection  (c)  will 
arise  when  excessive  space  is  needed  for  storing  the 
domain  of  C  and  when  q  is  a  set  or  tuple  valued  variable 
occurring  in  (1). 

If  we  consider  a  general  expression 
C  =  E  (x  ,  .  .  .  ,x,  ,x,  ,  ,  .  .  .  ,x  )  that  depends  discontinuously 
in  R  on  changes  to  its  free  variables  x,  -,  ,  .  .  .  ,x   then 
objections  (a) ,  (b)  and  (c)  can  be  extremely  difficult  to 
overcome.   The  basic  formal  differentiation  method  sketched 
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in  the  Introduction  requires   that  we  use  a  map  C  to  store 

separate  values  of  expressions  E'  instantiating  E  in 

appropriately  chosen  values  of  x,  , , . , . ,x  .   If  for 

i  =  k+l,...,n  we  let  D    refer  to  the  range  of  values  that 

1 
X.  can  have  in  R,  then  the  maximiam  number  of  stored  calcula- 

tions  in  C  is   ]  f   #D   .   Even  if  each  discontinuity 

i=k+l    ^i 
parameter  x.  is  boolean  valued,  C  can  come  to  store  as  many 

n-k 
as  2     different  calculations. 

One  possible  way  of  diminishing  the  number  of  values 

that  need  to  be  stored  in  such  cases  is  to  group  the 

parameters  x,  ,,,..., x   into  subexpressions  e  of  E  which 
^  k+1      n  '^ 

depend  only  on  x,  ^,...,x  .   The  advantage  of  this  rests  on 

^       ^  k+1      n 

an  elementary  property  of  finite  mappings:   the  cardinality 
of  the  range  of  such  a  mapping  never  exceeds  the  size  of 
its  domain.   We  can  use  this  approach  when  each  subexpression 
e  of  E  behaves  like  a  finite  map;  i.e.,  when  E  is 
appliaative . 

As  an  example  of  this,  note  that  in  dealing  with  an 
expression 

(4)   C  =  {x  G  s|f(x  +  q^)  =  (f(qj_  +  g(q2))  +  ^3)^  ' 

we  can  reduce  the  number  of  values  of  C  that  need  to  be 
stored  by  using  a  map  C^(q^,f(q-,_  +  g(q2))  +  q3)  instead 

of  c^  (f  ,g,q-L,q2'q3)  • 

To  reduce  (4)  to  the  map  C,  ,  we  can  take  the  follow- 
ing steps: 
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(i)   On  entrance  to  R,  perform  the  initialization 
C  :=  nul tset . 

(ii)  Within  R  replace  all  calculations  of  C  by  uses  of  the 
expression 

if    [b-^,b2]  :=  [q3  ,  f  (q-^+g  (q2)  ) +q3]  ^  PR0JECT{2,C) 
then    C (b, ,b„ ) 
else    C(b^,b2)  :==  (x  6  s|f{x  +  b^)  =  b^}  . 

(iii)  At  each  program  point  in  R  at  which  s  undergoes  a 
'slight'  change   s  :=  s  +  A  ,  execute  the  following 
prederivative  code, 

(5)   (V[bj^,b2]  €  PR0JECT(2,C)  )  C(b^,b2)  :=  C{h^,h^) 

+  {x  e  A|f  (X  +  bj_)  =  b2}; 

END    V;  ■ 

which  is  justified  by  Rule  1.   Note  that  (5)  can  be 
rewritten  in  an  efficient  form  as  follows: 

(5')  (VxG(A  *  s),  b^  e  DOM  C|b2  :=  f(x+b  )  €  DOM   C{b  }) 
C(b^,b2)  :-  C(b^,b2)  +  {x}; 
END    V; 

The  cost  of  (5)  is  directly  proportional  to  #Cx#A  while 
the  cost  of  (5')  is  0{^DOM   Cx#A)   where  DOM   C  is  the  set 
of  first  parameter  values  of  C. 

(iv)   Whenever  an  indexed  assignment  f(y)  :=  Z   occurs  in  R, 
we  can  keep  C  available  by  use  of  Rule  2;  which  is  to  say 
by  executing  the  following  code: 
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(6)   (V[b^,b2]  e  PR0JECT(2,C)  )  SQ(bj^,b2)  :-  (x^s  |  x+b^=y }  ; 
C(b^,b2)  :=  C(b^,b2) 

-  {x  e  SQ(b^,b2) |f (x+b^)  -  b2}; 
END    V; 
f(Y)  :=  Z; 
(V  [b^,b2]ePROJECT(2,C) )C(b^,b2)  :=  C (b^ ,b2) +{xesQ (b^ ,b2) 


f(x+b^)  =  b^}; 


END    V; 


This  can  be  further  optimized  to  yield  the  following  code, 

(6'  )  (Vb   e  DOM   C) 

if   y-b  e  s  &  b   :=  f (y)  e  dOM  C{b^} 

then   C(b,,b2)  :=  C(b,,b2)  -  {y-b,}; 

endif 
end   V  ; 

f(y)  :=  Z; 

(Vb   e  DOM   C) 

if   y-b   G  s  &  Z  G  DOM   C{b^} 

then   C(bj^,Z)  :=  C(b^,Z)  +  {y-b^}; 

endif 
end    V  ; 

It  is  easy  to  see  that  the  computational  cost  of  (6')  is 
proportional  to  %DOM   C.   Note  that  y  need  not  be  a  region 
constant  in  (6 ' ) • 
(v)   In  keeping  C  available  in  R,  we  can  ignore  all  modifi- 
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cations  to  the  map  g  or  to  the  free  variables  q  ,  q  ,  or  q^ 
of  (4) .   We  note  that  by  making  use  of  a  map  C,  instead 
of  the  more  straightforward  C„  to  reduce  (4),  the  method 
of  parameter  regrouping  just  described  excludes  values 
of  the  sets  f  and  g  which  are  part  of  the  domain  of  C^ 
from  the  domain  of  C,  and  thereby  ameliorates  objection  (c) 
as  well  as  objections  (a)  and  (b) . 

Although  the  last  example  illustrates  some  advantages 
of  'regrouping',  the  transformations  of  (5)  to  (5")  and 

(6)  to  (6')   go  beyond  'regrouping'  and  illustrate  more 
powerful  improvement  techniques.   In  the  remainder  of  this 
section  we  will  utilize  these  techniques  systematically  to 
extend  Rules  1  and  2  to  additional  methods  for  reducing 
generalized  set  formers 

(7)  C  =  {x  G  s  I  K(x,q^,q2,  .  .  .  ,qj^)  }  . 

This  expression  depends  continuously  on  differential  changes 
to  s  and  on  indexed  assignments   to  mappings  f  occurring 
in  K  (provided  that  at  least  one  such  occurrence  of  each 
such  f  contains  a  parameter  expression  involving   the  bound 
variable  x) .   However,  (7)  depends  discontinuously  on 
changes  in  the  free  variables  q,,...,q  . 

Suppose  that  we   want  to  reduce  (7)  in  a  strongly 
connected  region  R,  and  suppose  that  values  of  (7)  are 
stored  in  a  map  C(qw...,q  )•   Then  within  R,  and  for  all 
changes  of  s  by  the  operation  s  :=  s  +  A  ,     the  following 
prederivative  code  can  be  generated  by  a  straightforward 
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application  of  Rule  1: 

(8)  (V [q^, . . . ,q^]  e  PROJECT(n,C) ) 

C(q^,..,,q^)  :=  C  (q.^^ ,  .  .  .  ,q^)  +  {xGA  |  K  (x  ,q^  ,  .  .  .q^)  }  ; 
END    V; 

Inspection  of  (8)  leads  us  to  anticipate  a  time  cost  directly 
proportional  to  #Cx#AxCost (K (x,q, , . . . ,q  )).   To  update  (7) 
at  indexed  assignments  f(y, ,...,y  )  :=  Z  for  which  the  f 
terms  involved  in  x  and  occurring  in  K  are 
f  (p^^  (x,q-|^,  .  .  .  ,qj^)  ,  .  .  .  ,p^^(x,q^,  .  .  ,  ,q^)  )  for  i  =  l,...,r 
the  pre  and  post  derivative  code  generated  by  extending  the 
rule  (5)  of  section  C  (i.e.,  the  rule  from  which  Rule  2  is 
derived)  would  be  as  follows: 


(9)    (V[q  ,...,q  ]  e  PROJECT(n,C) ) 

r    m 


^0 


(q  ,...,q^)   :=  {xSs |  OR (  &  P^ ■ (x, q^ , . . . ,q^) =y  ) } 

i=l  j=l   J  J 

C(q^,...,q^)  :=  C  (q^  ,  .  .  .  ,  q^) -{xGSq  (q^^ ,  .  .  .q^)  | 

K(x,q^, . . . ,q^) }; 
END    V; 

f  (y-^,  .  .  .  ,y^)  :=  Z; 
(V  [q^,  .  .  .  ,q^]  e  PROJECT(n,C) ) 

C(q^,  .  .  .  ,qj^)  :=  C  (q^^ ,  .  .  .  ,qj^) +{xesQ  (q^ ,  .  .  .  ,q^)  | 


K(x,q^, . . . ,q^) }; 


END    V; 


This  code  can  sometimes  be  modified  in  the  manner  described 
in  the  last  section  (cf.  2.C  Ex.  (9) -(11))  to  yield  code 
which  keeps  s„  available  in  R  via  incremental  'easy'  calcula- 
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It  is  not  difficult  to  see  that  computation  of  (8) 
or  (9)  can  be  unacceptably  expensive  due  to  objections  (a) 
and  (b) .   We  also  note  that  simple  regrouping  of  disconti- 
nuity parameters  need  not  always  remove  these  objections 
successfully. 

Nevertheless,   these  problems  can  be  overcome  in 
cases  in  which  s_  and  C  are  continuous  relative  to  differ- 
ential changes  in  the  continuity  parameters  of  (7). 
Fortunately,  this  holds  for  a  few  special  cases  of  common 
occurrence   in  SETL  programs.   These  special  cases  which 
may  be  said  to  involve  'removable'  discontinuities  include 
set  formers  based  on  the  elementary  forms, 
{x  e  s  I  f(x)},   {x  e  s  I  f(x)  =  q},   {x  e  s  |  f(x)  s  Q}, 
(x  e  s  I  f(x)  f   Q},   (x  e  s  I  q  e  f(x)},  and 
(x  e  s  I  f(x)  <relop>  q}   (where  <relop>  can  be 
</  f^  ,    >f  ^)  .      Set  formers  involving  boolean  valued 
subexpressions  which  include  terms   q  ^  f(x)  and  f(x)  1=  q 
are  not  directly  susceptible  to  reduction. 

In  the  following  discussion  we  will  show  how  to 
reduce  the  basic  set  formers  above  and  more  general  set 
expressions  built  up  from  these  by  combining  boolean 
valued  subexpressions  of  these  elementary  forms  using  the 
logical  connectives  &,  ov ,    and  ~1  .   We  will  also  present 
techniques  for  handling  still  more  general  set  formers 
containing  unfavorable  relational  operations  within  their 
boolean  subexpressions. 

A  most  important  special  case  is 
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(10)  C  =  {x  e  s  I  f(x)  =  t}  . 

To  differentiate  (10)  ,  where  we  assume  that  the  values  of 

(10)  are  stored  in  a  map  C(t) ,  we  can  apply  Rule  1  and 
are  led  to  the  prederivative 

(11)  (Vq  G  DOM   C)  C(q)  :=  C (q)  +  {x  6  A  |  f (x)  =  q}; 

END    V; 

of  (10)  with  respect  to  s  :=  s  +  A. 

We  know  from  the  preceding  discussion  of  objection 
(b)  that  if  ^DOM   C   is  too  large   the  computation  (11) 
will  be  costlier  than  a  full  recalculation  of  (10) .  However, 
it  is  actually  only  necessary  to  apply  Rule  1  to  those  sets 
C(q)  that  actually  change.   But  C(q)  will  not  change  if 
{x  e  A  I  f (x)  =  q}  is  empty.   Thus,  the  set  DOM   C  appearing 
in  (11)  can  be  replaced  by 

(12)  {q  e  DOM  C  I  (3x  G  A  I  f  (x)  =  q)  } 

which  is  usually  a  smaller  set  than  DOM   C.   (But  (12)  may  be 
costlier  to  compute.)   Moreover,  (12)  can  be  transformed 
into  an  'easier'  calculation: 

(12')     {f(x):  X  G  A  I  f(x)  G  BOM   C} 

which  computes  the  same  set  as  (12)  but  in  0(#A)  time. 
Also,  since  redundant  calculations  C(q)  :=  C (q)  +{xGA | f (x) =q} 
of  (11)  can  be  introduced  without  invalidating  (11) ,  the 
set  (12')  need  never  be  computed  at  all;  we  can  simply 
rewrite  (11)  as 
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(13)  (Vx  6  A|f(x)  e  BOM   C)  C(f(x))  := 

C(f (x))  ±  {y  e  A|f (y)  =  f (x) } ; 

END    V; 

Finally  after  transforming  the  repeated  assignment  in  (13) 
to  the  equivalent 

(14)  (Vy  6  A|f(y)  -  f(x))  C(f(x))  :-  C(f(x))  +  {y}; 
END    V; 

and  jamming  the  iterator  of  (14)  with  (13)   we  arrive  at  a 
highly  efficient  prederivative 

(15)  (Vx  G  A|f(x)  e  DOM   C)  C(f(x))  :-  C(f(x))  +  {x}; 
END    V; 

equivalent  to  (13)  but  requiring  only  0(#A)  elementary  steps. 
Note  here  that  we  have  demonstrated  the  correctness  of  (15) 
informally  by  first  noting  a  conceptually  easy  but  ineffi- 
cient metatransformation  (8)   and  then  by  passing  to  (11)  as 
a  particular  but  still  inefficient  instance  of  (8) .   The  form 
(15)  is  then  derived  easily  using  simplifying  transformations, 
We  will  use  this  approach  repeatedly  throughout  this  chapter. 

If  the  map  f  of  (10)  is  changed  within  L  by  an  indexed 
assignment   f(y)  :=  Z   then  since  the  f  depth  of  (10)  is  1 
and  because  the  boolean  subpart  of  s   =  {x  e  s | x  =  y }  is 
linear  in  x,  and  does  not  depend  on  t,  the  general  derivative 
computation  (9)  leads  to  the  following  version  of  Rule  2: 
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(16)     (Vqe  DOM   C) 

Sq  :==  (y}  *  s; 

C(q)  :=  C(q)  -  {x  e  Sq  |  f  (x)  =  q}; 
END    V; 
f(y)   :-  Z; 

(Vq  G  DOW  C)  C(q)  :=  C(q)  +  {x  S  SQ|f(x)  =  q}; 
END    V; 


Since  in  (16)  s.  is  a  compiler  generated  variable  used 
only  once,  since  the  expression  {y}  *  s  is  invariant  in 
(16) ,  and  since  s.  will  be  useless  on  exit  from  (16) ,  we 
can  remove  the  single  assignment  to  s_  and  replace  all 
uses  of  s_  in  (16)  by  occurrences  of  {y}  *  s.   We  can 
also  apply  the  chain  of  transformations  (11) -(15)  to  the 
iterators  in  (16) .   This  leads  to  the  following  improved 
code  : 


(16')   (Vx  e  {y}  *  s  I  f(x)  e  dom   C) 

C(f  (x))  :=  C(f  (x))  -  {x}; 
END    V; 
f(y)  :=  Z; 
(Vx  e  {y}  *  s  I  f (x)  e  DOM   C) 

C(f  (x))  :=  C(f (x))  +  {x}; 
END    V; 

We  can  also  rearrange  the  iterators  in  (16')  into  the  form 
(Vx  G  {y}|x  G  s  &  f(x)  G  DOM   C)<BLOCK(x)>   which  simplifies 
immediately  to   if  y  g  s  &  f (y)  G  DOM   C  then    <BLOCK(y)>. 
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Our  final  Rule  2  variant  is  then 

(16")  if  Y  e    s    &    f(y)  e  DOM   C  then   C{f(y))  :=  C(f{y))-{y}; 
endif; 
f{y)  :=  Z; 
^/■  y  €  s  &  f{y)  e  z)OM  C  then   C(f(y))  :=  C(f(y))+{y}; 

endif; 

which  can  be  computed  in  essentially  constant  time. 

According  to  our  standard  method,  we  can  differentiate 
an  expression  (10)  in  a  region  R  by  inserting  the  code  (15) 
and  (16")  at  appropriate  program  points  within  R,  insert- 
ing C  :=  nullset      on  entrance  to  R,  and  replacing  uses  of 
(10)  in  R  by 

(17)  i/  t  e  DOM   C  then   C(t)  else    C(t)  :=  {xGs|f(x)  =  t}. 

But  special  characteristics  of  (10)  can  sometimes  be 
exploited  to  obtain  several  further  improvements.  It  is 
instructive  to  note  that  if  we  could  determine  at   compile 
time  the  range  D   of  values  that  the  discontinuity  parameter 
t  can  have  in  R,  then  on  entrance  to  R  we  could  store  values 
of  (10)  in  C  for  all  values  of  t  in  D  .   This  could  be  done 
by  using  the  following  efficient   code  , 

(18)  (Vq  e  D^)  C(q)  :=  nullset;    end    V ; 
(Vx  e  s  I  f (x)  e  d^) 

C(f  (x))  :=  C(f  (x))  +  {x}; 
end    V ; 

which  computes  the  same  thing  as 
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C  :=  nutlset; 

(Vq  e  D^)  C(q)  :=  {x  e  s  |  f (x)  =  q}; 
end    V  ; 

Consequently  uses  of  (10)  in  R  could  be  replaced  by  the 
simple  retrieval  operation  C(t). 

Since  the  method  just  described  does  not  involve  a 
'memo'  function,  its  overall  computational  cost  is  easy  to 
predict.   Only  the  inexpensive  calculations   (15)  and  (16") 
are  introduced  into  R,  while  the  cost  of  (18)  is  about  the 
same  as   (10)  which  we  eliminate  from  R.   Thus,  we  expect 
a  speedup  considerably  greater  than  that  attained  by  the 
previous  method  which  was  complicated  by  overhead  in  memo 
function  maintenance  operations  such  as  (17). 

Unfortunately,  the  method  we  have  just  described  is 
not  generally  applicable   since  it  requires  determination 
of  the  set  D   ,  undecidable  at  compile  time.   Nevertheless, 
there  is  another    similar  approach  which  offers  similar 
speedup  advantages  over  the  standard  method,  but  does  not 
depend  on  any  difficult  analysis.   This  new  approach 
essentially  replaces  D^^  by  the  image  set  f[s]  (image  of  f 
restricted  to  s) ;  this  set  includes  all  values  of  q  for 
which  C(q)  =  {x  G  s|f (x)  -   ql   is  nonnull.   Whenever  f [s] 
is  spoiled  by  changes  in  f  and  s,   C  can  record  new  values 
in  memo  function  fashion,  and  it  can  also  undergo  differ- 
ential modifications  within  derivative  code.   Consequently 
we  are  able  to  remove  all  calculations  of  (10)  from  R  and 
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replace  them  by  if  t  &   DOM   C  then   C(t)  else      nullset. 

The  following  additional  steps  describe   this  method  more 

fully. 

i.    On  entrance  to  R,  initialize  C  by  executing 

(19)  C  :=  nullset; 

(Vx  G  s)  if   f (x)  e  DOM   C  then 

C(f(x))  :-  C(f(x))  +  {x};  else; 
C(f (x) )  :=  {x};  endif; 

which  costs  about  the  same  as  (10)  to  execute. 
ii.   Instead  of  (15),  use  the  following  variants. 

(20)  /*  for  s  :=  s  +  A  */ 

(Vx  G  (A  -  s))  if  f(x)  G  BOM  C  then 
C(f (x) )  :=  C(f  (x))  +  {x};  else 
C(f(x))  :-  {x};;; 

and 

(20- )      /*  for  s  :=  s  -  A  */ 

(Vx  G  (A  *  s)  |f  (x)  G  DOM   C) 

C(f (X))  :=  C(f (x))  -  {x};; 

respectively  as  prederivatives  of  (10)  relative  to  s  :-    s+A . 
Note  that  DOM   C  can  change  in  (20)  and  (20'). 
iii.   After  inserting  into  (16")  code  which  updates  DOM   C, 
we  can  use  the  following  derivative   for  ^indexed  assignments 
to  f, 
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(21)  if  yGs    &    f(y)ez?OW  C    then   C(f{y))     :=   C(f(y))-{Y}; 

endif; 

f(y)     :=   z 

i/   y   G    s    then 

if   f(y)    e    Z)OM   C    t/ien    C(f(y))     :=    C(f(y))  +  {y}    else 

C(f (y))     :=    {y}; 

endif', 
endif', 

It  should  be  clear  that  the  transformations  just 
described  can  attain  greater  speedup  than  the  standard 
technique.   What's  more,  when  #D   is  large,  a  better 
utilization  of  space  is  also  achieved. 

The  preceding  results  apply  in  an  interesting  way  to 

a  class  of  set  formers  typified  by 

» 

(22)  C  =  {x  G  s  I  f(x)  €  q}  , 

where  the  free  variable  q  is  a  set.   Recall  from  section  (C ) 
that  (22)  is  continuous  relative  to  small  changes  in  s  and 
to  indexed  assignments  to  f .   If  q  is  changed  by  a  computa- 
tion q  =  q  +  A  where  #A  <<  #q,   then  the  corresponding 
prederivative 

(23)  C  =  C  +  {x  e  s  I  f (x)  e  A} 

will  often  represent  a  small  change  to  C.  However,  because 
(23)  still  requires  an  iteration  over  s,  this  update  compu- 
tation will  often  be  too  expensive  to  allow  profitable 
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reduction  of  (22)  , 

For  this  reason,  it  is  appropriate   in  handling  (22) 
to  use  the  identity 

{x  e  s  I  f (x)  e  A}  =  u   {x  e  s  I  f (x)  =  b}  . 

be  A 

The  expressions  C    =    {x  &    s    |  f (x)  =  b}  which  then  appear 

can  be  differentiated  by  the  methods  sketched  earlier  in 

the  present  section;  i.e.,  by  using  the  following  code  as 

a  first  prederivative  of  (22)  with  respect  to  q  :=  q  +  A, 

(24)  (Vy  e  A,  w  e  {u  e  s  I  f (u)  =  y}) 

C  :=  C  +  {w}; 
end    V;     '■ 

But  to  handle  (24)  efficiently  we  must  formally  differen- 
tiate C  =  {u  e  s  I  f (u)  =  y}   and  store  its  values  as 

_  I 

a  map  C  (y) . 

Set  formers  like 

(25)  C  -  {x  e  s  I  f (x)  ^  q} 

can  be  treated  similarly.   A  prederivative  of  (25)  with 
respect  to   q  :=  q  +  A  is  '  . 

(26)  C  :=  C  +  {x  e  s  I  f  (x)  G  A} 

which  leads  at  once  to  a  more  efficient  prederivative 

(26*)        (Vy  e  A,  w  G  {u  G  s  I  f(u)  =  y}) 
C  :=  C  +  {w}; 
END    V; 
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Formula  (26')  can  then  be  improved  in  much  the  same  way 
as  (24). 

Set  formers  involving  boolean  valued  subexpressions 
based  on  comparison  operations  such  as 

(27)  C^  =  {x  G  s  I  f(x)  <  q} 

can  be  treated  as  special  cases  of  (22) .    To  see  this,  let 
M  be  the  largest  q  value  that  needs  to  be  considered,  and 
let  m  be  the  minimum  value  of  {f (x) ,  x  G  s}  over  all  f 
and  s  that  can  appear.   Putting   sq  :=  {b,  m  <_  b  <  q},  we 
see  that  (27)  is  equivalent  to  {x  e  s  |  f (x)  G  sq}. 

If  for  A  >  0,  q  changes  slightly  by  q  :=  q  +  A,  then 
sq  changes,  also  slightly,  by 

sq  :=  sq  +{b:  q  £  b  <  q  +  A)  /*  for  q  :=  q  +  A  */ 
or  by 

sq  :=  sq  -  {b:  q  -  A  _<  b  <  q}   /*  for  q  :=  q-A  */ 

Thus,  to  update  C,  we  can  simply  execute  the  prederivative 
code 

(28)  (Vq  ^  y  <  q+A ,  w  e  {u  G  s|f(u)  =  y})   /*  for  q:=q+A  */ 

C^  :=  C^  +  {  w}; 
end    V  ; 
or 
(Vq  >  y  >  q-A,  w  G  {u  G  s|f (u)  =  y})   /*  for  q:=q-A  */ 

C^ :=  C^  -  {w}; 
end    V  ; 
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which  can  be  further  improved  by  differentiating  the 

set  formers  C  (y)  =  {u  s  s|f(u)  =  y}   as  in  the    last  two 

examples . 

However,  the  total  ordering  T  =  (< ,  DOM   C  )  can  be 
exploited    to  further  optimize  the  iteration  that  appears 
within  (28)   by  techniques  not  generally  applicable   to  (24) 
To  do  this,  on  entrance  to  R  we  sort  DOM   C'  in  ascending 
order  and  produce  the  predecessor  and  successor  maps, 
pred   and  suae,    on  T   (where  pred(n)  is  the  maximum  element 
of  DOM   C').   Next,  we  make  the  assignment        i 
k  :=  [min :  w  e  DOM   C '  p ( w  <  q) J  w.   Then  we  can  keep  C,  , 
pred,    suaa ,    and  k   available  collectively  in  R  by  efficient 
incremental  calculations.   Thus,  whenever  q  is  changed 
'slightly'  by  q  :=  q  +  A,  the  following  efficient  prederi- 
vatives  can  be  executed, 

(28')       {while    k  <  q+A)  /*  for  q  :=  q+A  */ 

C^    :=    C^  +  C'  (k)  ; 
k   :=  suaa {k) ; 

end  while; 

and 

(28")      {while    pred(k;>  q-A)       /*  for  q  :=  q-A  */ 
k  :=  pred (k) ; 
C^:=  C^  -  C'  (k)  ; 

end   while', 

Note  that  (28')  and  (28")  can  run  considerably  faster  than 

(28)  when  DOM   C'  is  sparse  in  the  interval  [m,M] .  Whenever 
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s  undergoes  a  change   s  :=  s  +  A  ,  and  if  DOM   C'  is 
maintained  as  a  balanced  tree,  then  the  appropriate  inser- 
tions and  deletions  necessary  to  maintain  pred,    suae,    and  k, 
require  #Axiog  {^DOM   c')  steps. 

Another  class  of  special  cases  derives  from 

(29)  C  =  {x  e  s  I  q  G  f (x) } 

a  set  former  which  despite  its  close  resemblance  to  (22) 
must  be  handled  very  differently.   While  (22)  is  continuous 
in  all  of  its  parameters,  (29)  is  discontinuous  in  q.  Thus 
we  must  save  the  value  of  C  in  a  map  C(q).   Fortunately, 
however,  the  discontinuity  which  appears  here  is  removable. 
If  we  apply  the  general  rule  (8)  to  (29)  we  obtain  prederi- 
vative  code 

(30)  (Vt  e  DOM   C)  C(t)  :=  C(t)  +  {x  G  A  |  t  G  f(x)};    end  V; 

for  modifications  s  :=  s  +  A.  This  code  can  be  improved  by 
extending  the  iteration  not  over  all  of  DOM   C   but  over 
the  smaller  set 

C  =  {t  G  BOM   C  I  (3x  G  A  I  t  G  f(x))} 

which  can  be  written  equivalently  as 

C  =  [+:  X  G  A]  f (x)  *  DOM   C  . 

Further  symbolic  manipulation   of  (30)  leads  to  the  follow- 
ing prederivative  , 
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(30')   (Vx  e  A,  t  e  f (x)  *  DOM   C)  C(t)  :=  C(t)  +  {x};  end    V; 

which  is  generally  preferable  to  (30)  (especially  when 
#f (x)  <<  iDOM   C) . 

If  f  undergoes  a  change   f (y)  :=  Z,   then  since 
s„  =  {x  e  s|x  =  y}   does  not  depend  on  q  and  can  be 
calculated  efficiently  on-the-fly,  the  general  derivative 
formula  (9)  realizes  the  following  Rule  2  update  computa- 
tion , 

(31)       Sq  :=  -i/  y  s  s  then    {y}  else    nullset; 
(Vt  e  DOM   C) 

C(t)  :=  C(t)  -  {x  G  Sglt  e  f(x)}; 
END    V; 
f(y)  :=  Z; 

(Vt  e  DOM   C)  C(t)  :=  C(t)  +  {x  G  so|t  G  f(x)} 
END    V; 

However,  it  is  not  difficult  to  see  that  (31)  can  be 
rewritten  more  efficiently  as 

(31')      i/  y  G  s  then    (Vu  G  f (y)  *  BOM   C) 

C(u)  :=  C(u)  -  {y};  end    V ; 

endif; 

f(y)  :=  z 

if  Y   &    s    then     (Vu  G  z  *  DOM   C) 

C(u)  :=  C(u)  +  {y};  end    V  ; 

endif', 
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Finally,  note  that  a  suitable  prederivative  of  (29)  with 
respect  to  the  change  f (y)  :=  f (y)  +  A  of  f  is  given  by 
the  following  special  case  of  (31'): 

(32)  i/  y  e  s  then     (Vu  e  A  *  DOM   C) 

C(u)  :-  C(u)  +  {yl;  end    V; 
endif; 

Since  the  derivative  rules  (30'),  (31')  and  (32) 
conform  to   our  standard  reduction  method,  in  order  to 
differentiate  (29)  fully  we  only  need  to  initialize  C 
on  entrance  to  R  by  executing  C  :=  nullset      and  to  replace 
all  uses  of  (29)  in  R  by  the  calculation, 

(33)  if  q   ^    DOM   C  then   C(q)  else    C(q)  :=  {x^s  |  qef(x)}. 

However,  by  introducing  (33),  we  fail  to  eliminate  all  cal- 
culations of  (29)  from  R;  this  imperils  our  chance  of 
attaining  a  worthwhile  speedup.   Nevertheless,  we  can  improve 
the  handling  of  the  memo  function  C  for  (29)  as  we  did  in 
the  previous  special  case  (10),  and  can  effectively  eliminate 
all  uses  of  (29)  from  R. 

To  do  this,  note  first  of  all  that  values  of  (29)  stored 
in  C(q)  are  only  meaningful  (i.e.  nonnull)  when  q  is  in  the  range 

Q  =  U  f(x)  of  f.    Thus,  by  keeping  the  restriction  of  C 

xG  s 
to  Q  available  in  R  we  can  replace  all  uses  of  (29)  by  the 

conditional  expression,  if  q   ^    DOM   C  then   C(q)  else    nullset. 

The  following  steps  achieve  this  result: 
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i.    On  entrance  to  R  execute 

C  :=  nullset', 

(Vx  e  s,  t  G  f (x)) 

if   t   &    DOM   C  then    C(t)  :=  C(t)  +  {x}; 
else      C (t)  :=  {x} ; 

endif; 
end   V; 

which  takes  0(#f)  steps  to  compute  but  does  the  same  thing 
as 

C  :=  nullset; 

(Vx  G  s,  t  e  f(x))  C(t)  :=  (y  G  s|t  G  f(y)}; 

end   V  ; 

ii.   At  points  in  R  where  s  changes,  execute  the  appropriate 
prederivative ,  which  will  be  either 

(Vx  G  (A  -  s) ,  t  G  f(x))        /*  for  s  :=  s+A  */ 
if   t   e    DOM   C  then    C(t)  :=  C(t)  +  {x}; 

else    C(t)  :=  {x}; 
endif; 
end   V; 
or 

/*  for  s  :=  s  -A  */ 

(Vx  G  (A  *  s)  ,  t  G  (f(x)  *  DOM   C))  C(t)  :=  C(t)  -  {x}; 
end   V; 

(This  is  based  on  (30').)  When,  #f(x)   is  uniformly  bounded 
by  6,  and  when  6  <<  #s,   then  the  cost  of    these  last 
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calculations  is   0(6x#A)   which  we  expect  to  be  inexpensive 
relative  to  (29) . 

iii.  For  changes  f(y)  :=  Z,  we  can  use  the  following  deriva- 
tive code . 

if  Y   e    s    then    (Vu  e  f (y)  *  DOM   C) 

C{u)  :=  C(u)  -  {y}; 
end   V  ; 
endif; 
f(y)  :=  Z; 
i/  y  e  z  then    (Vu  e  Z)  if  u  e  DOM   C  then 

C(u)  :=  C(u)  +  {y};  else 
C(u)  :=  {y};  endif; 
end   V ; 
endif; 

which  executes  in  no  more  than  0(6)  elementary  steps. 

iv.   The  rule  corresponding  to  (32)  for  updating  C  with 
respect  to  f (y)  :=  f (y)  +  A   is 

/*  for  f(y)  :=  f (y)  +  A  */ 

if  Y  s    s    then    (Vu  G  (A  -  f(y)))  if  u  &    DOM   C  then 

C(u)  :=  C(u)  +  {y};  else 
C(u)  :=  {y};  endif; 
end    V  ; 

endif ; 

and 
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/*  for  f(y)  :=  f(y)  -  A  */ 

t/  y  e  s  then    (Vu  e  (A*  f (y)  *  DOM   C) ) 

C(u)  :-  C(u)  -  {y}; 
end      V ; 
endif; 

both  of  these  requiring  only  0(#A)  steps. 

It  is  not  difficult  to  predict  with  some  assurance  that 
the  method   just  proposed  offers  a  better  chance  of  speedup 
than  does  the  standard  method. 

The  two  expressions   C,  =  {x  e  s|f{x)n=  q}  and 
C„  :=  {x  G  s|q  ^  f  (x)  }  can  be  handled  by  transforming  them 
into  the  more  convenient  forms, 

s  -  {x  G  s  I  f (x)  =  q}} 
and 

s  -  {x  e  s  I  q  e  f (x) } 

respectively.   The  setformers   {x  g  s|f(x)  =  q}  and 
{x  G  s|q  G  f(x)}  thus  exposed  can  usually  be  reduced  profit- 
ably by  methods  previously  described. 

Each  of  these  examples  (10),  (22),  (25),  (27),  and  (29) 
typifies  the  treatment  of  a  broad  class  of  expressions  that 
can  often  be  differentiated  profitably.   Within  the  class 
associated  with  (10)  we  consider  the  set  formers 

(34)      C  =  {x  G  s  I  K^(x)  -  K2(qj_,  .  .  .  ,q^)  }  , 

where  q,,...,q   are  free  variables  upon  which  C  depends 
discontinuously.   We  assume  that  K  of  (34)  is  a  subexpression 
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only  involving  x,  parameters  upon  which  (34)  depends 
continuously,  and  maps  f.  upon  which  C  can  depend  discon- 
tinuously  but  whose    occurrences  in  K,  all  have  parameters 
depending  on  x.   K_  of  (34)  is  assumed  to  be  a  subexpression 
only  involving  the  parameters   q, , . . . ,q   on  which  C  depends 
discontinuously ,   and  the  maps  f.. 

We  can  treat  (34)  easily  if  we  recognize  that  it  is 
constructed   from  (10)  by  a  composition  in  which  K,  replaces 
f  and  K  (q, , . . . ,q  )  replaces  t.   To  formally  differentiate 
(34)  in  a  program  region  R  we  can  store  separate  calculations 

(34)  in  a  map  C (K„ (q, , . . . ,q  ))   proceeding  in  the  following 
steps : 

i.    On  entrance  to  R  perform 
C  :=  null  set; 
(Vx  e  s)  if   K  (x)  G  DOM   C  then 

C(K^(x))  :=  C(K^(x))+  {x};  else 
C(K^(x) )  :=  {x}; 
endif; 
end    V; 

(This  is  based  on  (19)). 

ii.   Whenever  s  changes  by  s  :=  s  +  A,  execute  the  corres- 
ponding prederivative  code, 

(35)  (Vx  e  (A  -  s))  if   K^(x)  G  BOM   C  then 

C(K-|^(x))  :=  C(K^(x))+  {x};  else 

C(K^(x))  :=  {x}; 

endif ; 
end    V  ; 
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for  s  :=  s  +  A  and 

(35')       (Vx  G  (A  *  s) |K^(x)  e  dOM   C) 

C(Kj_(x))  :=  C(K^(x)  )  -  {x}; 
end    V  ; 

for  s  :=  s  -  A.   (Both  (35)  and  (35')  can  be  derived 
from  (20)  and  (20')  by  substituting  K,  for  f.) 

iii.  Modifications  (in  R)  of  q, , . . . ,q   or  any  map  f. 
occurring  in  K_  (but  not  in  K,  of  (34))  do  not  require 
insertion  of  update  code. 

iv.   Replace  all  occurrences  of  (34)  in  R  by  occurrences  of 

if   K  (q,,...,q  )  e  DOM   C  then   C(K-(q  ,...,q  ))  else    nullset. 

In  constructing   a  derivative  of  (34)  with  respect  to 
indexed  assignments   f(y, ,...,y  )  :=  Z,  we  cannot  simply 
make  use  of  the  update  code  (21)  used  for  example  (10), 
because  in  deriving  (21)  we  make  use  of  characteristics  of 
(10)  absent  in  (34).   In  fact,  we  cannot  easily  overcome 
the  inadequacies  of  our  current  formulation  of  Rule  2   to 
make  it  apply  usefully  here.   Hence,  before  giving  the  desired 
derivative  code  for  (34),  we   must  iron  out  the  lingering 
difficulties  of  Rule  2.   This  we  will  do  shortly,  after  first 
studying  the  continuity  properties  of  an  important  generali- 
zation of  (34) . 

It  is  possible  to  differentiate  more  general  set  formers 
than  the  elementary  ones  just  described  by  combining  the 
boolean  subparts  of  these  forms  using  logical  connectives 
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&,  or,    ~1  .   Consider,  as  an  example,  the  following  setformer, 

m 

(36)  {x  e  s|  &   (K^(x)  =  K^(q^,. .. ,q^) )  &  K^^^ (x) } 

i=l 

in  which  for  i  =  l,...,m+l  the  terms  K.  are  restricted  in 
the  same  way  as  K,  of  (34)  and  for  i  =  l,...,m  the  terms  K^ 
are  defined  similarly  to  K^  of  (34) .   If  we  use  a  map 
C(KJ  (q^,  .  .  .  ,q^)  ,  .  .  .  ,K^(q-,^,  .  .  .  ,q^)  )  to  store  the  values  of 
(36) ,  and  if  we  treat  C  in  a  manner  similar  to  the  improved 
techniques  used  for  handling  (34) ,  then  the  prederivatives 
of  C  with  respect  to  the  changes  s  :=  s  +  A   can  be  written  as 

(37)  (Vx  G  (A  -  s)  |\+i(x))     /*  for  s  :=  s+A  */ 

if    [K  (x) , . . . ,K^(x) ]  e  PROJECT(m,C)  then 

C(K^(x) ,...,K^(x))  :=  C{K^(x) ,...,K^(x))+{x}; 
else    C(K^(x) , . . . ,K^(x) )  :=  {x}; 

endif; 
end    V  ; 


and 

/*  for  s  :=  s  -  A  */ 
(37')       (Vx  G  (A  *  s)|Kj^^^(x)  &  [K-^(x)  ,...,K^(x)]ePROJECT(m,C)) 
C(K^(x)  ,..  .,K^(x))  :=  C(K^(x)  ,...,Kj^(x))  -{x}; 
end    V; 

which  generalize  (35)  and  (35').   Note  that  if  we  consider 

the  set  t  to  be  initially  null  then  the  prederivative  code 

(37)  of  {x  G  t|  &   (K^(x)  -  K^(q-^,...,q^))  &  K^+i(^)>   ^^^h 

i=l 
respect  to  a  change   t  :=  t  +  s   gives  the  appropriate  code 
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which,  on  entry  to  R  stores  initial  values  of  (36)  in  C. 

Observe  that  if  disjunctions  rather  than  conjunctions 

occur  in  (36)  then  continuity  will  ordinarily  fail.  However, 

one  important  exception  (cf.  (7)  of  Section  C) 

r   ra 
(38)   i  (y  ,...,y  )  =  {x  G  s\or{    &  p..(x)  =  y  )} 

deserves  special  attention.   In  this  case  we  can  show  that 

if  the  setformer   (38)  is  continuous  in  all  of  its  parameters 

except  y-i  f  •  •  wY   ^  then  s„  is  continuous  in  all  of   its 

parameters . 

To  differentiate  (38)  we  take  the  following  steps: 
i.    On  entrance  to  R  execute 

3„  :=  nullset  -■ 

(Vx  €  s,    1  1  i  1  J^) 

i/[p.  (x) , . . . ,p.  (x) ]  e  PROJECT(r,SQ)  then 

iQ(p^^(x)  ,  .  .  .  ,p^^(x)  )  :=  Sq(p^j^(x)  ,  .  .  .  ,p^^(x)) 

+  {x};  else 

Sq  (p^-,^  (x)  ,  .  .  .  ,p^^(x)  )  :=  {x}; 

endif; 

end   V; 

(This  is  based  on  (37)  and  a  kind  of  'redundant  discontinuity 
parameter  elimination'.)  The  cost  of  (39)  is  essentially  the 
same  as  the  cost  of  computing  the  set  former  (38) . 

ii.   Within  R,  whenever  s  changes  by  s  :=  s  +  A   execute  the 
prederivatives , 
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(40)  (Vx  e  (A-s) /  1  1  i  1  r)         /*  for  s  :-  s+A  "/ 
•z:/[p.  ^  (x)  ,  .  .  .  ,p.  ^  (x)  ]  G  PROJECT(r,s  )  then 

Sq  (Pj^-j^  (x)  ,  .  .  .  ,p^^  (x)  )  :=  Sq  (p^^  (x)  ,  .  .  .  ,p^^  (x)  )+{x}  ;eZse 
Sq(p^-j^  (x)  ,  .  .  .  ,p^^(x)  )  :=  {x}; 
endif; 
end    V  ; 


and 

(40')  (Vxe  (A  *  s),  l<i<r|[p^^(x),...,p.^(x) ]GPROJECT (r , SqI 
iQ(p^^(x) , . . . ,p^^(x) )  :=  Sq(p^^(x) , . . . ,p^^(x) )-{x}; 
end    V ; 

(These  are  similar  to  (37)  and  (37').)   Note  that  the  compu- 
tational cost  of  either  (40)  or  (40')  is  0(#Axr)  steps, 
inexpensive  relative  to  a  calculation  of  the  setformer  (38) . 

iii.   The  following  variant  of  (11)  of  Section  C  is  a  deriva- 
tive of  (38)  with  respect  to  indexed  assignments  to  f: 

(41)  (VxeiQ(y^,...,y^) ,  l<i<r  1 

[b^,...,bj^]  :=  Ip^j^  (x)  ,  .  .  .  ,p^^(x)  J  n=  [yi'---'YmJ 
&  [h^, . . . ,h^\    G  PROJECT (m,SQ) ) 
Sq  (bj^,  .  .  .  ,b^)  :=  Sq  (b^,  .  .  .  ,b^)  -  {x}; 
end   V  ; 

f  (y^/  •  •  •  /Yj^)  :=  Z; 

(VxSSq  (y^,  .  .  .y^)  ,lliir  I  [b^,  .  .  .b^^]  •=[V^i  (x)  ,  .  .  .p^j,(x)  ] 

n=  [y^' •  •  • /Y^,]  ) 
if    Lb, , . . . ,b  ]  e  PROJECT (m,SQ)  then 

Sq (b^, . . . ,b^)  :=  Sq (b^, . . . ,b^)  +  {x};  else 

in  lb,  , . . . ,b_)  :=  {x}; 
J  .  V.   1       m 

end    V ; 
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If  there  exists  a  bound  6  on  the  sizes  of  sets  s_(y, ,...,y  ) 

0   1       m 

wnich  is  small  relative  to  #s,  then  the  cost  of  (41)   which 
is  proportional  to  6xr  will  be  inexpensive  relative  to  (38). 

iv.   Suppose  that  an  n-ary  mapping  g  which  undergoes  indexed 
assignments  in  R   (and  only  such  assignments)  occurs  in  t 
different  terms  in  (38j,  g  iq-^j^  (x)  ,  .  .  .  ,qj^^  (x)  )  ,g  (q^^  (x)  ,  .  .  .q^^  (x)  ) 
Then  the  derivative  code  ( whicn  keeps  s»  current  in  R) 
associated  with  indexed  assignments  to  g,  is  as  follows, 

l42)  (Vx6hQ(Vj_,  .  .  .  ,v^)  ,  l^i^r  I  [p^^(x)  ,  .  .  .  /P^^^Cx)  JePROJECT  (m,  s^) ) 
Sq(Pj^_,^(xJ  ,.  .  .  ,p^j^(x)  )  :=  Sq(p^^(x)  ,  ..  .  ,p^^lx)  )-{x}; 
end    V; 

g (v^, . . . ,v^)  :=  Z; 
^Vx  e  Hq  (v^,  .  .  .  ,v^;  ,  1  <^  i  <_  r) 

if    [p. ,  vx) , . . . ,p   (x)]  €  PROJECT(m,s  )  then 

Sy  (p^3^(x)  ,  .  .  .  ,p^j^(x)  )  :=  Sq(p^j^(x)  ,  .  .  .p^^(x)  )+{x}; 
else    Sq  (p^^  (x)  ,  .  .  .  ,p^j^(x)  )  :=  {x}; 
endif'i 

end    V  ; 

t     n 
where   h- (w  ,...,w  )  =  {xs  s|  or    (  &   q..(x)  =w.)} 

1=1    1  =  1     -'  -" 

is  an  auxiliary  expression  which  must  be  differentiated  in 
R  .1.1  the  same  manner  as  (38)  .   Note  that  (42)  ,  like  (41) 
consists  of  'easy'  calculations. 

The  method  just  presented  leads  in  an  obvious  way  to 
inexpensive  derivatives   for  expressions  like  (1)  of  Sec- 
tion C,  and  also  for  (34)  and  (36)  of  the  present  section. 
In  fact,  it  supports  a  reformulation  of  Rule  2  which  is  both 
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more  general  and  more  efficient  than  that  previously  given 
(cf .  (10)  and  (11)  of  Section  C) .   This  rule  is  stated  as 
follows. 

Rule  2.    Consider  general  set  formers  (7)  satisfying  the 

following  two  conditions: 

Condition  1.   The  discontinuity  parameters   q^ • . . /q 

of  (7)  must  be  removable  in  the  derivative  (8);  i.e.,  the 

size  of  the  iteration  in  (8)  must  be  bounded  independently 

of  #C,  C  being  the  mapping  which  stores  separate  values  of  (7) 

Condition  2.   Suppose  that   f,,...,f    are  all  those  maps 

which  change  in  R  and  which  also  begin  occurrences  of  map 

retrieval  terms  involving  x  within  the  boolean  subpart  of  (7) . 

Then  we  require  that  for  i  =  l,...,z,  each  change  to  f. 

within  R  must  be  an  indexed  assignment  and  each  f .  term 

which  involves  x  within  (7)  must  not  also  involve  q./.-./q  • 

For  k  =  l,...,z,  we  will  denote  the  r   distinguishable 
f,  terms  satisfying  Condition  2   above  by 

fk^Pil(^) Pirn,  (^)) ^k^Pr.l^^) Pr,m  (^))  • 

K  K  K  K 

Then  to  keep  (7)  available  in  R  in  the  presence  of  indexed 
assignments  to  f,,...,f    and  small  changes  to  s,  we  must 
keep  the  auxiliary  maps 

-k   -k   ^ 

Sv(yi  '  •  •  wY^  )  =  (x  e  s  I  or    {    &      p..(x)  =  y  )} 
k   1      m^  ^^^      j^-L   i:        3 

k  =  l,...,z   available  also.   Since  these  auxiliary  maps 
are  of  the  same  form  as  (38)  ,  they  can  each  be  initialized 
on  entrance  to  R  by  executing  the  code  (39) .   They  can  also 
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be  updated  efficiently  whenever  s  changes  by  using  rules 
(40)  or  (40')-   Finally,  at  indexed  assignments 
fj(v, ,...,v   )  :=  W  we  can  combine  the  techniques  of  (41) 
and  (42)  to  obtain  the  following  derivative  which  updates 
s,  ,...,s   ,  C   collectively 

(42')  (Vx  e  s. (v^ ,...,v^  )) 
Aim 

(VI  <  i  <  r^l  [p1;^(x)  ,.  .  .,pI^    ]GPROJECT(m^,Sj^)  ) 

s^(p^-^(x)  ,  ..  .  ,p^j^  (x))  :=  s^(p^-,^(x)  ,  .  .  .p^^  (x)) 

-  {x}; 
end   V ; 


(Vl<iir^|ly^ Yj,^]  ••=  [pj^(x),...,pj^^(x)] 

e  PROJECT  (mj^,s^) 
&  [Yi  /  •  •  wy_  ]  ^    [v,  , .  .  .  ,v   ]  ) 

^£(Pil^^^ "^L^^""^^     ''=    ^£<Pil(^)"--Pim^(^^^ 

-  {x}; 

end   V ; 


(VI  <  i  <  r  I [p. , (x) , . . . ,p    ]  G  PROJECT(m  ,s  )) 

z 
S2(p-l(x),...,p^^  (X))  :=  s^(p^^(x),...p.^  (X)) 

z  z 

-  {x}; 

/*  INSERT  derivative  code  at  this  point  for  */ 
/*  updating  C  with  respect  to  s  :=  s  -  {x}  */ 

end   V  ; 

f  jj(v^,  .  .  .  ,Vj^)  :=  W; 
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(Vx  G  S^(V^,...,V^^)) 

(VI  ^  i  <  r^) 

if    [p\iM  '  •  '  • 'pI^    (x)]  ^  PROJECT  (m^,s^)  then 

-    1         ■'"1 
s^(p^j^(x)  ,  .  .  .  ,p^^(x)  )  :- 

s.  (p]^^(x)  ,  .  .  .  ,p]^j^  (x)  )  +  {x};  else 

s^{p^^(x) , . . . ,p^^{x))     :=    {x}; 

endif; 

end   V; 


(VI  £  i  <_  r^l  [p^^(x)  ,'--.p\^    (x)  J  ^    [v^,...,v^   ]) 
if    [p^T  (x) ,...,pf^  (x)]  e  PROJECT(m  ,s  )  then 
s^(p^^(x),...,pj^^)  := 

i„  (pj^^(x)  ,...,pjj^    (X))    +    {x};    else 


I 
■£""         " 


s„  (p^^  (x)  ,  .  .  .  ,p^j^    )     :=    {x}; 


endif', 
end   V  ; 


(VI   ^  i    <    r^) 


)^       (x)  1    e    PROJECT  (m_,s_)     t/ten 


if    [p^3^(x) ,pr^  (x)]  e  PROJECT  (m^,s^. 

Z  2 

s^(p^j^(x)  ,  .  ..  ,p^^  (x))  :  = 

S^(p^^(x),...,p?j^  ))  +  (x);  else 
i^(p^^(x),...,p^j^  (x))  :=  {x}; 
endif; 
end   V  ; 

/*  INSERT  derivative  code  here  for  '*/ 
/*  updating  C  with  respect  to   x  :==  s  +  {x}  */ 
end    V; 
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Note  that  we   can  further  optimize  (42')  by  removing  code 

which  updates  any  map  s,  ,  k  =  l,...,z   which  does  not 

depend  on  f..   At  worst,  the  cost  of  (42')  will  be 

z  ^ 

0(zx6(  y  r.))  where  6  is  a  bound  on  the  cardinality  of 

i=l  / 

the  sets   ^k  ^^i '  •  •  •  '  Vj^  )'  ^  ~  l,...,z. 

k 
The  preceding  reformulation  of  Rule  2  finally  puts 

this  rule  into  a  viable  form.   In  part,  this  is  due  to  the 
fact  that  we  can  lift  the  previous  restriction  that  all  of 
the  parameters   v  , ...,v   of  the  indexed  assignment 
f , (v, , , . . ,v, )  :=  W  in  (42')  must  be  region  constants. 
Perhaps  more  importantly,  it  is  a  result  of  establishing 
a  greater  coordination  between  application  of  Rule  1  (or 
its  extension,  (8))  and  2.   Since  we  derived  Rule  2  from 
the  more  basic  Rule  1,  it  is  not  surprising  that  in  all 
examples  presented  in  this  thesis,   whenever  Rule  2's 
enabling  condition  1  (which  pertains  to  Rule  1)  holds,  so 
does  condition  2  (which  is  more  directly  related  to  Rule  2) . 
It  is  not  unreasonable,  therefore,  to  expect  that  the  effi- 
cient reformulated  Rule  2  can  be  applied  in  the  same  contexts 
in  which  we  can  differentiate  the  setf o] mer  (7)  with  a  fast 
variant  of  Rule  1. Therefore, in  the  rest  of  this  chapter, we  will 
deal  mainly  with  formal  differentiation  rules  based  on  Rule  1. 

Next,  consider 
m 
(43)  {xesl  &  (K^(q^,...,q^)  €  K^  (x)  )  &  K^^^  (x  ,b-^  ,  .  .  .  ,  b^)  } 

in  which  K  ,k'  ...,K  ,k'  ,  and   K  ^,  are  restricted  as  before. 
11      mm         m+1 

Then  to  compensate   for  a  redefinition   s  :=  s  +  A   of   s. 
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we  can  execute  the  following  prederivative , 

(44)  (VxeA,  [bj^,...,b^]  G  PROJECT(t,C)  , 

d^  e  kJ(x)  *  DOM   C{bj^,  .  .  .  ,b^},  ... 

dj^  G  Kj^(x)  *    DOM  C{h^,...  ,h^,<i^,...,d^_-^}     I 

^m+l^^'^l'-'-'^t^^ 
C  (b-j^,  .  .  .  ,b^,d^,  .  .  .  ,d^)  :=  C  (b,  ,  .  .  .  ,b^  ,d^  ,  .  .  .  ,d^) +{x}  ; 

end   V  ; 

Note  that  the  boolean  subparts  of  (36)  and  (4  3)  can  be 
conjoined  within  a  new  setformer   which  can  then  be  differ- 
entiated with  respect  to   s  :=  s  +  A   by  using  rules  (37) , 
(37')  and  (44)   together   in  an  obvious  way.   However, 
replacing  any  of  these  conjunctions  with  a  disjunction  will 
usually  prevent  any  profitable  formal  differentiation. 
Two  more  examples  worth  considering  are 

(45)  C^  =  {xGs|  (  [+:  1  1  i  1  m|K^(x)  ^  Q^]l)  =  0 

^  ^m+1^^'^1 \^  ^^  Km+2(^'^t+l"--'^v^^ 

and 

(45')  C^    =    {xGs| ([+:  1  1  i  1  m|K^(x)  e  Q^]l)  =  0 

^  ^m+l^^'^l'-'-'^'t^  ^"^  Km+2(^'\+l'--"^v^^ 

whose  continuity  properties  we  can  exploit  in  order  to  handle 

the  setformers 
m 

{xes|  &   (K^(x)  e  Q^)  &  ^j^+i^^'^i'  -  ■ '  '^t^    °^   ^m+2^^'^t+i' •  •  •'^v^  ^ 
i=l 
and 

m 
{xes|  &   (K^(x)  ^  Q^)  &  Kj^+i(X'b^,  .  ..  ,b^)  or    Kj^+2  ^^'"^t+l "  *  "^v^  ^ 
i=l 
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respectively  equivalent  to  (45)  and  (45').   To  reduce 

(45)  and  (45')/  we  must  first  reduce  their  compound 
operator   subexpressions 

COUNTl (x)  =  [+:  1  1  i  1  m  |  K. (x)  ^  Q. ] 1    and 

C0UNT2 (x)  =  l+:  1  1  i  1  m  |  K. (x)  e  Q.Jl  , 

both  of  which  are  defined  on  the  domain  s.   Once  this 

has  been  done,  we  can  reduce  (45)  and  (45')  to  maps 

C,  (b, ,...,b  )  and  C2  (b, , . . . ,b  ).   The  prederivatives  of 

COUNTl  and  C,  corresponding  to  modifications  Q.  :=  Q.  +  A 

are  given  by 

(46)  (Vq  e  (A  *  Q  ),  X  e  (w  e  s  I  K  (w)  =  q}, 

[b^,...,b^]  e  PROJECT  (v,Cj^)  I  K^^^(x,bj^,  ..  .,b^) 

^  ^^m+2(^'\+l ^v^) 

COUNTl (x)  :=  COUNTl(x)  +  1 ; 

if   COUNTl (x)  =  2  *^^" 

C^(b^,  .  .  .  ,b^)  :=  Cj^(b^,  .  .  .  ,b^)+{x}; 

endif; 
end    V ; 

The  prederivatives  of  COUNT:^  and  C„  are 

(46')  (Vq  e  (A  *  Q.) ,  x  e  {w  e  s  |  K. (w)  =  q}, 

[b^,...,b^]  G  PR0JECT(v,C2)  |Kj^^^(x,b^,  .  .  .  ,b^) 

^  "^^+2^^'  \+i ^)) 

C0UNT2(x)  :=  C0UNT2 (x)  +  1; 

if   C0UNT2(x)  =  J  then 

C,(b  ,  ...,b  )  :=  C.,  (b,  ,  .  .  .  ,b  )+{x}; 
endif:  ^  ^  ^      ■^  ^ 

end   V; 
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Note  that  in  both  (46)  and  (46'),   {w  e  s|k.(w)  =  q} 

can  be  differentiated   efficiently  using  methods  presented 

earlier  (cf.  discussion  of  (34)). 

The  techniques  used  to  handle  (45)  and  (45')  also 
apply  to 

(47)   C^  =  {xSsI ([+:  1  1  i  i  m | K^ (x)eQ^] 1)  n=  0 

&  K^+;L(x,b^,  .  .  .  ,b^)  or   K^+2  ^^'^t+1 '  *  '  * '^v^  ^ 
and 

(47-)   C^=  {xGsl  (  [+:  1  1  i  i  m|  K^  (x)^Q^]  1)  ~\=    0 

^  ^m+1^^'^1 \^  ''''  ^m+2^^'\+l ^v^^ 

which  are  equivalent  to 

m 
{xGs|(  or    (K^(x)  ^   Q^))    &  K^^^(x,b^, . . . ,b^) 
i=l 


or   V2(^'bt+i'--"bv^^ 


and 


m 
{xes|(  or    (K^(x)  ^  Q^)  )  &  ^j^+l'^^'^l' "  "^t^ 

or    K^+2^^'Vl ''v^^ 

As  in  the  previous  examples,  we  make  use  of  maps  C0UNT2  and 
COUNTl   in  order  to  reduce  (47)  and  (47')  to  maps  C^  (b-^^ ,  .  .  .  ,b^) 
and  C.(b, ,...,b  ).   The  prederivative  code  sequences  analogous 
to  (46)  and  (46')   for  updating  C0UNT2 ,  COUNTl,  C^  and  C^ 
relative  to  changes  Q.  '=   Q^    ±  ^      a^e 
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(48)    (Vqe  (A  *Q.),  xe  {we  s  |  K.(w)  =q}, 

[bj_,...,b^]  e  PROJECT  (v,C3)  I  Kj^_^^(x,bj^,...  ,iD^) 

&  ^K^^^{x,b^ b^)) 

C0UNT2(x)  :=  C0UNT2(x)  +  1; 

if   C0UNT2(xj  =  J  then    C^ (b^ , . . . ,b^) 

:=  C^ (b^, . . , ,b  )  +  {x}; 
ENDIF; 
end   V; 

for  C-,  and 

(48')  (Vqfc(A*Q.),  xe{wes|K. (w)=q} ,  [b^ , . . . ,b^J GPROJECT (v,C  ) 

COUNTl(x)  :=  COUNTl(x)  +  i  ; 

if   COUNTl(x)  =  J  then    C^(b,,...,b  )  := 

C^  (bj^,  .  .  .  ,b^)  +  {x}; 
ENDIF; 
END    V; 

for  C  . . 
4 

Note  that  whenever  the  formula  K   , (x,b, , . . . ,b. ) 

^  ~'^m+2  ^^'^t+1' •  •  • ''^v^   appearing  in  (46),  (46'),  (48), 
and  (48')   can  be  simplified  to  conjunctions  of  terms  of 
the  form  K(x) ,  K(x)  =  K' (q^, . . . ,q^) ,  or  K' (q^, . . . ,q^)    e  T, 
we  can  use  various  optimization  techniques  introduced 
earlier  in  this  chapter  to  speed  up  the  iterations  within 
(46),  (46'),  (48),  and  (48').   Note  also  that  set  formers 
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C^  =  {xes|K^(x)  e  Q^  &  K2(x,bj^,  .  .  .  ,b^)  or   K^  (x,b^_|_^  ,  .  .  .  ,b^)  } 
and 
Cg  =  {xes|Kj^(x)  ^  Qj^  &  K2(x,b^,  .  .  .  ,b^)  or    K^  lx,b^_^^  ,  .  .  .  ,b^)  } 

are  degenerate  forms  of  (45),  t46j  and  (45'),  (46*)   and 
can  be  differentiated  directly  without  using  the  auxiliary 
maps  COUNTl  and  C0UNT2 . 

As  an  illustrative  example,  consider 

(49)  {xGs|(K^(x)  n=  K|(q^,. . . ,q^) )  or     (K^ (q^ , . . . ,q^)^K^ (x) ) 

or    K^(x)  ov    K^(x)  ^  Q  &  K^ (x)  =  K^ (q^ , . . . ,q^) 

&  Kg(q^,. . . ,q^)  G  K^ (x)  &  K^ (x) } 

We  can  reduce  (49)  by  storing  its  values  in  a  map 
C(K2  (q-,^,  .  .  •  ,q^)  ,  Kg  (q^^,  .  .  .  ,q^)  ,  K^  (q^^ ,  .  .  .  ,q^)  ,  K^  (q^  ,  .  .  .  ,q^)  ) 
which  is  continuous  with  respect  to  changes  Q  :=  Q  +  A  to  Q. 
The  following  code  is  a  fairly  efficient  prederivative  of  (49) 

(50)  (VqGA,  xG{w€s1k^(w)  =  q},  b^^  G  K2  (x)  *  DOM   C, 

b2  G  Kg(x)  *  DOM   C{b^}  I  '^Yi^U)     &  K^(x)  & 
[K  (x) ,K  (x) ]  G  PR0JECT(2,  C{b^,b2})) 
C(b^,b2,K^(x)  ,K^(x) )  :=  C  (b^ ,b2 ,K^ (x)  ,K^ (x) )  +  {x}; 
END    V; 

This  can  be  further  improved  by  reduction  of  {wGs|K. (w)  =  q}. 

We  note  in  regard  to  the  last  several  examples  that 

quantifiers  appearing  in  the  boolean  subpart  of  setformers 

can  be  rewritten  as  finite  disjunctions  and  conjunctions. 

Thus,  in  addition  to  the  direct  methods  we  have  already 

mentioned  for  handling  quantifiers,  we  can  also  make  use  of 
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the  techniques  for  differentiating  general  setformers  (36), 
(43)  ,  (45)  ,  (45-)  ,  (47)  and  (47')  . 

The  techniques  and  concepts  described  in  this  section 
can  be  used  to  extend  the  basic  continuity  Rules  1  and  2  to 
generalized  set  formers  involving  multiple  iterators,  i.e. 

(51)   C=   {[x^,...,x  ]:  x^Gt^,  x^et^{x^)  ,  .  .  .  ,x^t    ix^,  .  .  .X   ^^) 

I  K(x, , . . . ,x  ) }  . 

"^     q 

Here  we  assume  that  K  does  not  contain  any  free  occurrences 

of  any  free  parameters   appearing  in  the  set  expressions 

t, ,...,t  .   In  order  to  reduce  the  total  expression  (51), 

we  must  first  be  able  to  reduce  all  the  subexpressions  t. 

upon  which  (51)  depends.   Let  us  suppose  for  the  sake  of 

simplicity  that  each  expression  t.  in  (51)  is  continuous  in 

all  of  its  parameters  other  than  x, ,...,x._,  (which  we  will 

treat  as  'discontinuity  parameters'.)   Then  if  the  parameters 

on  which  t .  depends  continuously  undergo  only  small  changes 

in  R,  we  know  from  preceding  analysis  that  t.  is  reducible; 

to  reduce  it  we  store  its  values  as  a  map  t.(x,  ,...,x.  -,)  . 

D   1       D-l 

Since  for  2  <^  j  £  q,   the  range  of  values  of  x.  depends  on 
the  values  of   x,,...,x._,  ,  it  is  convenient  to    consider 
the  set 

°[x^,...,x._^]  ^  ^f^l ^i-l^=  x^et^,...,x._^et._^(x^,...x._2)} 

as  the  domain  of  the  map  t . (x  , . . . ,x . _, ) .   Whenever  a  parameter 

in  which  t.  is  continuous  changes  differentially,  the  values 

of  t . (x^ , . . . ,x . _, )  must  be  updated  for  all  necessary  values 
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of  [x  ,  . . . ,x.  , ]  e  D p  .   (these  sets  of  values  are 

i         ll        [X,...,X._,J 

defined  by  rules  like  those  discussed  earlier  in  the  present 

section) ;  the  actual  update   operations  to  be  employed  will 

be  defined  by  rules  like  those  of  Section  C.   Moreover, 

since  any  change  to  t . (x, , . . . ,x . _, )  can  also  change  the 

domains   D.  ,   for  j  >  i,  and  in  particular  can  cause 

Lx,  /  .  .  .  ,x.j 

them  to  increase,  it  will  sometimes  be  necessary  to  calculate 

additional  map  values   t .,  (x, ,... ,x .),... t  (x, ,... ,x^^) 

when  t .  (x-j  ,  .  .  .  ,x  .  _-,  )  changes.   Once  the  t,  ,  .  .  .  t  (x,  ,  .  .  .  ,x  ^^) 

are  known  to  be  reducible,  then  (51)  can  be  reduced  by 

replacing  the  subexpressions  t,,...,t  (x, , . . . ,x   , )  by  the 

maps  t-|,...,t  (x,  ,  .  .  .  ,x  _,  )  and  evaluating  the  result  on 

entrance  to  loop  R.   Then  directly  after  any  of  the  maps 

t . (x, , . . . ,x .  t)  are  updated  within  R  we  can  insert  code 
1   1       1-1       "^ 

(derived  in  part  from  the  code  to  update  t.)  which  updates  (51) 

As  an  example,  suppose  that  t  ,...,t  (x, , . . . ,x   , )  are 
all  reducible  to  maps  t,  ,...,t  (x,  ,  .  .  .  ,x  _-,)  r    all  these  maps 
having  domains  as  described  just  above,  and  suppose  that 
a  particular  set  expression  t   has  the  form 

{x  G  s  I  K'  (x,x,  ,  .  .  .  ,x  j_^)  }  . 
If  s  is  changed  slightly  within  R  by  an  assignment  s  :=  s  -  A, 
then  the  appropriate  prederivative  is 

(52)       (Vx^etj^,  .  .  .  'Vx-[-_i^tj_j^(x^,  .  .  .  ,Xj_2)  )tj(x^,  .  .  .  ,Xj_^) 

=    tj  (x^,  .  .  .  ,Xj_j^)-{xGA  |k'  (x,x^,  ,  .  .  /Xj_-|^)  }  ;  ; 

C    :=    C-{[x    ,...,x    J  :  X    Gt^,  .  .  .  ,x       ,  Gtj_^  (Xj^,  .  .  .  ,Xj_2)  / 

x^G{xeA  |K'  (x,x    ,  .  .  .  ,Xj_^)     ''^i  +  i^^j+i  (^1' ,Xj)  ,  .  .  .  , 

X   Gt    (Xw  .  .  .  ,x      ,)     I    K(Xw...,x    )}; 
q      q       1  q-1  1  q 
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However  in  the  case  of  a  differential  modification  s:=s+A 

of  s,  the  situation  is  not  so  fortunate.   In  this  case  the 

necessary  update  operations  are  described  by  the  following 

much  more  complicated  nested  loop  which  updates  old  values 

of  t^  and  calculates  new  values  of  t^  ,-,,...  ,t    . 
I  I  +1 '    '  q 

(53)   (Vx^et^, . . . ,Xj_^Gt^_^ (x^, . . . ,Xj_2) )  t^ (x^ , . . . ,x^_^)  := 
tj  (x  ,  .  .  .  ,Xj_^)  +  {x  e  A  |k'  (x,x  ,  .  .  .  ,Xj_j^)  )  } 
(Vx^  e  A|k'  (x,x-|^,  .  .  .  ,Xj_^)  )  tj^^  (x^,  .  .  .  ,x^)  :  = 
t_ I  \X-i  I  •  •  •  /X  /  ; 

VVX-p-.^u—  ,-|  \X-|  /  •  •  •  /X-pJ  )       L.-p,«  '^i  '  "  •  •  '^Tj_"]  '   • 

U-|-  ,  p  V^l  /  •  •  •  f^T  I  1  '  /    •  •  •  / 

(VXq_3^  e  tg_^(x^ Xg_2))  tg(x^ x^,^)  :  = 

tg(x^,  .  .  .  ,Xg_j^) 
end   Vx, , . . . ,  end   Vx  _, ; 

*—  '  ~     ^  "*~   iLX-./,,,^  X  J  Z      X^'^U-j  f   •  •  •  ^X-p^-.^'C-j-^i  \X-.  f   •  •  •  f  Kj  ^r^  }   t 
X^efxGA  |K'(x,X-|^,...,Xj_^)},    ^1  +  1^^1+ 1  (^1  f  .  .  •  /Xj)  ,  .  .  . 

X  Gt  (x^,  .  .  .  ,x  ^^)  I  K(Xj^,...,x  )  ; 

Despite  the  complexity  of  (53),  it  is  not  difficult  to 
see  that  the  work  implied  by  the  differential  update  calcula- 
tions (52)  and  (53)   is  much  less  than  the  work  necessary 
to  perform  the  total  calculation  (51) . 

Thus  we  see  that  (51)  is  continuous  in  the  collective 

differential  changes  to  every  value  that  the  map  t  (x  , ...x   ,) 

can  take  over  its  domain   D,  , ;   and  since  the 

LX-.  /  •  •  •  ,x^^-|  J 

expression   t    is  continuous  in  differential  changes  to  all 

of  its  parameters  except  for   x, , . . . ^x   ,  ,  we  can  conclude 
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that  (51)  is  continous  relative  to  all  of  the  continuity- 
variables  of  t  . 

As  a  final  remark  concerning  examples  (52)  and  (53) , 
we  note  that  even  when  some  or  all  of  the  computed  set 
expressions  t.  of  (51)  cannot  be  reduced  profitably  (because 
the  space  required  by  the  maps  t.  is  excessive),  we  can 
sometimes  reduce  (51)  without  reducing  the  maps  t . .   This 
is   done  by  replacing  (52)  and/or  (53)   by  the  code 

(54)   C  :=  C  +  {[x^,...,x  ],  x^Gt^,  .  .  .  ,Xj_^etj_-^(Xj^,  .  .  .  ,Xj_2)  , 
x^elxeAJK' (x,Xj^, . .  .  ,Xj^^)  ,  Xj^^€tj^^(x^, . . .  ,Xj)  , .  . .  , 
X   G  t  (x^,.,.,x  ^^)  I  K(x^, . . . ,x  ) } ; 

to  be  executed  within  L  after   s  :=  s  +  A.   Note   that  this 
code  will  often  be  inferior  to  (52)  and/or  (53)   for  cases 
in  which  (52)/ (53)  can  be  used,  since  in  (54)  the  expression 
t.  must  be  recalculated   repeatedly;  however,  since  the  set 
{x  G  A  I  K' (x,x  , . . . ,Xj_^) }  will  generally  be  much  smaller 
and  easier  to  calculate  than  {x  G  s  |  K ' (x,x^ , . . . ,Xj_^} / 
the  update  operation  (54)  may  be  much  less  burdensome  than 
the  full  calculation  (51) . 

Rule  2  generalizes  to  (51)  much  more  smoothly  than  Rule  1, 
Whenever  an  n-ary  map  f  (all  of  whose  occurrences  in  K  of  (51) 
have  at   least  one  parameter   involving  a  bound  variable  x^) 
is  changed  within  R  by  an  indexed  assignment  f  (y-j^ ,  •  •  • /y^^)  =  Zf 
the  code  required  to  update   the  value  of  (51)  is 
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(55)  s      =    {[x    ,...,x    ]:    x,Gt    ,...,    X  et    (x    ,...,x      ,) 

2  1  qil  <3"3l  <g~l 

I    p^^(Xj^,  .  .  .  ,x^)    =    y^    &...&    p^^(x^,  .  .  .  ,x^)    -   y^    or 

...    or   p^^(x^,.  .  .  ,Xg)    =    y^    &...&    Pj-n^^l' •  •  • '^q^    "   ^n^' 
C    :=      C    -    {  lx(l)  ,  .  .  .  ,x(q)  ]  :    x   G    s^  |  K  (x  (1)  ,  .  .  .  ,  x  (q)  )  }  ; 

f (y^/ . • • 'Y^)    :=   z; 

C    :=      C    +    {[x(l) , . . . ,x(q)] :    x   S    S2 | K (x (1) , . . . , x (q) ) } ; 

where  the  notation  used  is   like  that  which  we  have  employed 
in  the  preceding  discussion  of  Rule  2. 
One  last  example, 

(56)  C  =  {e(x)  :  x  G  s  |  K(x)  } 

will  illustrate  reduction  techniques  somewhat  different 
from  those  already  mentioned.   Expression  (56)  is  continuous 
with  regard  to  changes   s  :=  s  +  A  in  s,  and  it  has  the 
prederivative 

(57)  C  :=  C  +  {e(x):  x  G  A  |  K(x)}  . 

However,  Rule  1  does  not  generalize  easily  to  give  a 
prederivative  for  (56)  with  respect  to  changes   s  :==  s  -  A. 
One  way   of  handling  this  difficulty  is  to  transform  (56) 
to  the  form 

(58)  C  =  {e(x):  x  G  {w  G  s  |  K(w)}} 

and  then  reduce  {w  G  s  |  K (w) } .   Another  more  powerful  method 
begins  by  representing  (56)  as  a  multiset;  i.e.,  by  making 
use  of  an  auxiliary  map 

360 


COUNT(q)  =  [+:  X  G  s  I  e(x)  =  q  &  K(x)]l 


we  can  rewrite  (56)  by  an  equivalent  setformer 
C-,  =  {e(x)  :  X  e  s  |COUNT(e  (x)  )  n=  0}   which  is  easier  to 
handle.   C,  can  be  differentiated  profitably  with  respect 
to  the  changes   s  :=  s  +  A   using  the  following  prederi- 
vative  code , 


(59)    (Vx  e  (A  *  s) ,   e(x)  e  DOM   COUNT  |   K(x)) 
COUNT (e(x))  =  COUNT (e(x))  +  1 ; 
END    V; 
C,  =  C,  +  {  e  (x) :  X  e  A  I  COUNT (  e  (x) )  ^~  0 } ; 

To  deal  with  more  general  set  formers  than  those 
already  considered,  we  propose  to  transform  them  into 
expressions  involving  conveniently  dif f erentiable  subexpres- 
sions  on  the  one  hand  and  on  the  other  hand  subexpressions 
which  are  not  amenable  to  reduction.   Chapter  3  describes 
a  semiautomatic  source  to  source  interactive  transformational 
system  which  can  assist  this  process  of  program  restructuring. 

We  have  now   given  quite  a  number  of  illustrative 
examples,  and  can  move  ahead  to  our  next  chapter  which 
will  discuss  implementation  algorithms  and  general  applica- 
tions.  However,  before  doing  this,  we  shall  pause  momentarily 
and  note  that  there  exists  a  whole  class  of  transitive 
closure  algorithms  which  are  amenable  to  improvement  by  our 
transformations.   The  main  part  of  such  transitive  closure 
algorithms  typically  consist   of  while  loops  which  iterate 
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a  block  of  code  until  an  existential  quantifier  becomes 
false,  i.e.,  have  the  following   general  form: 

/*  initialize  variables  */ 

(60)  (while    3x6  s  |  K(x) ) 

<BLOCK> (x) 

END   while 

where   <BLOCK>  involves  redefinitions   s  :=  s  +  A  to  s, 
indexed  assignments   f(y^,...,y  )     :-    z      to  a  map  f  which 
also  has  occurrences  in  K,  and  perhaps  other  kinds  of 
changes  to  variables  on  which  K  depends   (we  also  assume 
that  <BLOCK>  contains  uses  of  x) .   The  techniques  presented 
in  this  chapter  indicate  that  (60)  can  often  be  transformed 
by  formal  differentiation  to  yield  a  faster  'workset'  version, 

(61)  /*  initialize  variables  */ 
workset   :=  {x  s  s  |  K(x)} 

(while   3x  G  workset) 
<BLOCK'> (x) 

END   while 

where   <BLOCK'>  is  formed  from    <BLOCK>   by  insertion  of 
derivative  code  to  keep  WORKSET  available.   Still  more 
generally,  there   ^ill  exist  situations  in  which 
{x  €  s|k(x)}  involves   discontinuity  parameters   q  , . . . ,q 


so  that  a  map   workset (q, ,... ,q  )  is  necessary  to  store 
separate  values  of  {x  e  s|k(x)}  within  the  loop  of  (61), 
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Many  sorting,  parsing,  graph,  and  general  problem  solving 
algorithms  can  be  written  in  the  form  (60) .   Some  of  these 
algorithms  will  be  found  in  Appendix  F. 

The  formal  differentiation  techniques  we  have 
described  allow  algorithms  to  be  written  in  a  'high  style' 
in  which  complicated  manipulation  of  worksets  can  be  avoided 
at  no  cost  since  these  methods  can  directly  generate  faster 
algorithms  from  these  'high  style'  versions.   The  perfection 
of  our  methods  will  therefore  enable  programmers  to  use 
powerful  high  level  dictions  to  write  clear  high  level 
programs  which  can  be  transformed  routinely  into  more 
efficient  low  level  versions  (cf.  Schl-Schl2   for  a  compre- 
hensive discussion  of  this  idea;  cf.  Schll,  Schl2,  Sch2  for 
particular  high  level  dictions  optimizable    by  formal  differ- 
entiation) .   Among  other  things  this  will  facilitate  correct- 
ness proofs  of  programs,  since,  e.g.,   we  can  expect  to 
prove  undifferentiated  programs  of  the  form  (60)  correct 
more  easily  than  their  more  complicated  workset  versions  (61) 
(cf.  Sch  10  for  further  elaboration  of  this  thought). 
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Ill .  An  Implementation  Design  of  Formal  Differentiation  of 
Expressions  Continuous  in  All  of  Their  Parameters 

A.    Overview  and  System  Design 

We  shall  now  present  a  design  for  a  system  supporting 
semiautomatic  formal  differentiation  of  expressions  contin- 
uous in  all  of  their  parameters.   The  design  to  be  presented 
is  a  first  step   toward  a  much  larger  task  --  i.e.  an 
interactive  program  transformation  system  which  supports 
formal  differentiation  of  arbitrary  expressions  as  well  as 
other  related  program  transformations  such  as  recursion 
elimination.    Practical  implementation  of  such  a  project 
would  require  development  of  a  large  system  incorporating 
strategies  which  choose  between  speedup  goals  and  storage 
limitations.   In  the  present  chapter  we  concentrate  exclu- 
sively  on  formal  differentiation  mechanisms  which  might  be 
included  in  such  a  system,  and  in  fact  will  only  deal  with 
expressions  continuous  in  all  of  their  parameters.  This 
design  will  then  be  used  to  suggest  extensions  which  would 
allow  discontinuous  cases  to  be  handled. 

The  diagram  below  (Figure  1)  gives  an  overview  of  our 
proposed  system.   A  system  user  is  assumed  to  issue  instruc- 
tions via  an  interactive  terminal  to  a  command  processor 
which  first  validates  his  commands  and  then  either  performs 
them  or  passes  them  on  to  an  appropriate  supporting  routine. 
The  command  processor  signals  successful  or  unsuccessful 
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completion  of  tasks   at   the  user's  terminal.   System 
utilities  include  the  following:   1.  A  parser  reads  a 
source  program  text,  parses  it,  and  outputs  an  annotated 
parse  tree  form  of  the  text  in  a  file  called  PARSEDIN 
for  subsequent  transformational  manipulation. 

2.  An  unparser  reads  PARSEDIN,  unparses  it,  and 
outputs  the  resulting  formatted   program  with  statement 
numbers  at  the  user's  terminal;  this  utility  allows 
the  user  to  see  the  results  of  transformations  applied 
to  the  PARSEDIN  file. 

3.  Transformation  generators   which  are  invoked  by 
the  command  processor  manipulate  PARSEDIN. 

To  perform  a  particular  transformation  T  these 
generators  find  information  about  T  in  a  transformation 
library,  TRANSLIB.   This  information  will  include  enabling 
conditions,  instructions  on  how  to  manipulate  PARSEDIN, 
and  links  to  other  transformations  in  TRANSLIB  that  can  be 
tried  next.   Each  generator  will  check  that  all  enabling 
conditions  are  met  before  performing  operations  on  the  tree 
file.   The  generators  may  also  request  user  intervention 
for  difficult  action  or  validation   decisions  which 
cannot  be  handled  automatically. 

The  basic  design  of  the  transformational  implementation 
discussed  here  uses  ideas  of  Loveman  [L2]  and  Kibler, 
Standish ,  and  Neighbors  [KIl ,ST1 ,ST2] .   However,  the 
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transformations  dealt  with  here,  and  formal  differentiation 
in  particular,  are  considerably  more  complex  than  the  local 
syntactic  transformations  which  largely  concerned  the 
authors  just  cited.   Data  flow,  type  analysis,  value  flow, 
inclusion  and  membership  relations  [Sch8],  and  even  human 
intervention  will  sometimes  be  necessary  to  enable  correct 
application   of  the  transformations  which  the  system  we 
propose  will  apply. 

Our  proposal  also  differs  from  the  previous  work  by  our 
use  of  a  variant  of  SETL  as  a  source  language.   This  choice 
is  probably  necessary  to  eliminate  some  sophisticated 
theorem  proving  (which  would  be  called  for  in  a  lower  level 
language)  to  establish  enabling  conditions  for  the  powerful 
transformations  we  wish  to  use.   The  close  link  between 
language  and  transformations  becomes  clearer  when  we  view 
the  effort   required  to  justify  the  use  of  a  transformation 
as  involving  a  temporary  decompilation  of  a  program  from  a 
language  of  lower  level  of  abstraction  in  which  the  meaning 
of  a  program  is  scattered  globally  in  the  text  into  a  language 
of  higher  level  in  which  difficult  semantic  details     are 
exposed  only  locally.   The  cost  of  this  temporary  decompila- 
tion can  be  minimized,  however,  if  each  transformation  serves 
to  translate  a  primitive  semantic  feature  used  in  a  limited 
aontext    of   a  program  at  one  language  'level'  into  a  more 
efficient  implementation  expressed  concretely  at  a  lower 
level.   Since  som.e  information  obtainable  from  analysis  of 
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more  abstract  versions  of  an  algorithm  will  be  lost  at 
lower  level  versions,  each  application  of  a  transformation 
must  be  chosen  carefully  to  avoid  dispersing  important 
program  facts  prematurely.   In  other  words,  we  are  aiming 
for  a   largely  top  down  program  manipulation  system. 

Of  course,  the  success  of  any  transformational  approach 
depends  on  the  ease  in  which   transformations  can  be  seen 
to  apply,  and  this  is  related  to  the  difficulty  of  formally 
justifying  their  use. These  factors  all  reflect  the  expres- 
sive power  of  the  programming  language,  and  the  ability  of 
this  language  to  express  everything  from  what  Schwartz 
calls  the  most  concise  base  'rubble'  form  of  an  algorithm 
to  its  implementation  version  'cobweb'  [SchlO ,Sch4] . 

B.    SYSTEM  DESCRIPTION 

(i)   Parser 

We  avoid  unnecessary  technical  complications  by 
describing  a  parser  for  a  modified  subset  SUBSETL  of  SETL 
text  (see  Appendix  A).   SUBSETL  lacks   'GO  TO ' s '  and 
labels,  function   and  subroutine  calls,  I/O,  and  allows 
no  side  effects  other  than  implicit  assignments  to  the 
bound  variables  of  existential  and  universal  quantifiers. 
SUBSETL  also   contains  type  declarations  (used  for  input 
variables)  of  the  form  <type> (<varlist>) ;   where  <type> 
can  be  INTEGER,  BOOLEAN,  TUPLE,  SET,  or  MAP  and  <varlist> 
is  a  list  of  variables   each  separated  by  a  comma. 
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A  type  declaration  placed  at  a  point  p  in  a  source  program 
determines  the  type  of  all  program  variables  found  in  <varlist> 
whose  scope  reaches  p. 

The  parser  first  produces  a  parse  tree  version  of  a  SUBSETL 
source  program.   A  control  flow  graph  can  subsequently  be  worked 
out  on  the  parse  tree  in  preparation  for  data  flow  analysis  and 
type  analysis.   After  this  analysis,  the  parse  tree  T  will  have 
been  annotated  with  the  successor  and  predecessor  maps  (FSUCC 
and  FPRED  defined  on  the  nodes  of  T)  of  the  flow  graph,  the  map 
USETODEF  which  associates  each  variable  use  i  to  the  set  of 
variable  definitions  which  can  reach  i,  the  map  DEFTOUSE  which 
associates  each  variable  definition  o  to  the  set  of  variable  uses 
reached  from  o,  and  a  map  TYPE  mapping  nodes  of  T  representing 
expressions  to  {INTEGER,  BOOLEAN,  TUPLE,  SET,  MAP}, 
(ii)  UNPARSER 

The  purpose  of  the  unparser  is  to  obtain  a  source  listing 
of  the  PARSEDIN  file.   The  UNPARSER  algorithm  simply  prints 
out  the  leaves  of  the  parse  tree  from  left  to  right  along  with 
the  numbers  of  statements.   Each  statement  begins  a  new  line 
preceded  by  a  statement  number,  and  if  a  statement  cannot  fit 
on  one  line,  it  will  appear  on  subsequent  lines  indented  two 
columns  to  the  right.   In  this  way,  the  loop,  block,  and  state- 
ment structure  of  the  parse  tree  will  be  reflected  in  the 
indentation  of  the  text  produced  by  the  UNPARSER. 

The  procedure  found  in  Appendix  E  (i)  represents  a  SETL 
version  of  UNPARSER,   The  parse  tree  consists  of  a  set  of 
nodes  whose  labels  correspond  to  the  syntactic   types. 
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lexical  types,  and  literals  of  the  SUBSETL  grammar. 

PROGRAM  is  the  root  node,  TSUCC  maps  nodes  to  tuples  of  blank 

atoms  representing  ordered  successor  nodes,  NUMBER  maps 

nodes  labeled  <statement>   to  statement  numbers,  and  LEAF 

is  a  function  defined  as  true  for  leaf  nodes  and  false 

otherwise. 

(iii)  TRANSFORMATION  GENERATORS; 

Our  transformational  approach  derives  from  the  ideas 
of  Loveman  [L2]  and  Standish,  Kibler,  Neighbors  [KIl]. 
We  describe  a  transformation  using  six  of  Loveman ' s  seven 
properties : 

1.  Name  -  identifies  a  transformation   and  describes 
the  parameter  format  for  invocation  ; 

2.  Enabling  condition  -  a  predicate  which  must  be  satisfied 
for  the  transformation  to  be  performed  ; 

3.  Tree  pattern  to  search  for  -  the  pattern  must  match  a 
section  of  the  tree  representation  of  source  text  in 
order  for  the  transformation  to  be  performed  ; 

4.  Replacement  rules  -  the  transformational  action  to  be 
performed  on  the  tree  representation  of  source  text; 

5.  Changes  to  global  functions  -  transformational  action 
to  be  performed  on  such  global  functions  as  USETODEF 
and  DEFTOUSE  maps ; 

6.  Chaining  directions  -  these  are  instructions  which 
trigger  attempts  to  perform  other  transformations  after 
the  current  one  successfully  completes. 

Loveman ' s  seventh  mechanism,  his  'improvement  heuristic',  we 
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All  the  transformations  of  our  system  are  encoded 
as  6-tuples  of  the  form  just  mentioned  and  are  stored  in 
a  transformational  library,  TRANSLIB.   Aside  from  the 
assortment  of  standard  program  transformations  catalogued 
in  [ST2] ,  TRANSLIB  should  contain  a  variety  of  set  theoretic 
transformations  important  in  preparing  source  code  for 
formal  differentiation  and  in  cleaning  up  the  messy  code 
transition  state  that  formal  differentiation  leaves  in 
its  wake  (cf.  Appendix  D  for  a  sampling  of  the  transforma- 
tions we  propose  to  use) . 

Like  the  transformational  systems  of  Loveman  and 
Standish,   the  one  described  here  will  treat  enabling  condi- 
tions as  ad   hoc   procedures  which  can  interrogate  global  maps 
such  as  the  USETODEF  links, gather  simple  information  from 
the  annotated  parse  tree,  or  ask  the  user  to  manually  vali- 
date difficult  program  facts.   For  purely  local  syntactic 
transformations,  the  tree  pattern  and  replacement  rules  can 
conform  to  Standish 's  production  system  [KIl]  as  the  left- 
hand  side  and  right-hand  side  strings  of  productions  of  the 
form  LHS  =>  RHS.   We  let  LHS  and  RHS  contain  pattern  variables 
(which  are  denoted  by  capital  letters),  literals,  and 
balanced  pairs  of    parentheses.   A  pattern  can  be  represented 
uniquely  as  a  tree,  e.g.,  if  a , ,o ^ ,o ^,o .,o^   are  either 
literals  or  pattern  variables  then  the  pattern 
(a,  (o^  (o,  o -.)    a_.)  a.  a^)   corresponds  to  the  following 
tree: 
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(1) 


We  say  that  a  pattern  (1)  matches  parse  tree,  P,  if 
(i)   (1)  has  the  same  structure  as  P  up  to  the  leaves  of  (1) . 
(ii)  all  literals  of  (1)  match  corresponding  leaves  of  P  and 
(iii)  all  occurrences  of  a  single  pattern  variable  a  in  (1) 
must  match  the  same  subtree  of  P. 

The  procedure  r4ATCH  in  Appendix  E  (ii)  is  a  SETL  version  of 
an  algorithm  which  matches  a  pattern  tree  (the  argument 
'PATTERN')  to  a  parse  tree  (the  argument  'TREE').  PSUCC  and 
TSUCC  are  the  respective  successor  maps  for  the  pattern  and 
parse  trees.   The  boolean  function  LEAF  is  defined  as  true 
for  the  nodes  having  no  successors,  otherwise  false.      LITERAL 
is  a  boolean  valued  function  which  returns  true    for  leaf  nodes 
which  are  not   pattern  variables  of  a  pattern  tree. 

Note  in  connection  with  the  foregoing  that  expressions 
enclosed  in  parentheses  and  pattern  variables  match  subtrees. 
Consequently,  a  production,  P:  {X  or    X)  =>  (X)  will  not  match 
the  tree  (2),  since  the  subtrees  on  both  sides  of  the  'or' 
are  not  the  same. 
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<EXPN> 


(2) 


<EXPN> 


<FACTOR> 


<ATOM> 

I 

A 


We  will  use  the  straightforward  match-replacement 
technique   just  described.   However,  many  transformations 
such  as  formal  differentiation  cannot  be  performed  by  a 
simple  rewrite  rule.   To  handle  these  more  general  proce- 
dures we  let  both  the  tree  pattern  and  the  replacement 
rule  entry  in  a  formal  differentiation  6-tuple  consist  of 
code  procedures. 

We  will  not  describe  differential  manipulation  of 
global  functions  (e.g.  data  flow  maps)  using  any  systematic 
formalism  but  rather  will  use  ad   hoc   procedures  to  do  this. 
Such  procedures  might  even  have  to  recalculate  such  global 
functions  completely. 

Although  chaining  directions  will  not  be  used  in  our 
proposed  initial  system,  their  success  as  reported  by 
Loveman  [LI]  and  Kibler  [KIl]   in  providing  a  way  to  auto- 
matically link  several  low  level  transformations  to  achieve 
a  higher  level  transformation  gives  chaining  mechanisms  high 
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priority  for  future  extensions  to  our  work  (cf.  Appendix  D, 
Sections  VII,  VIII  for  examples).   However,  the  system  we 
describe  will  contain  lower  level  primitives  which  can 
support  the  Kibler  chaining  technique  programmed  in  Appen- 
dix D. 

These  primitives  serve  to  limit  costly  searching 
through  the  parse  tree  and  production  space  by  defining  a 
tree  locality  and  selecting  transformations  to  try  within 
this  locality.   Program  localities  can  be  defined  by  either 
a  statement  number  (recall  that  each  statement  including 
compound  statements  are  given  unique  sequence  numbers  by 
the  parser)  and/or  by  a  pattern  string.   A  statement  number 
locator  defines  the  locality   to  be  searched  as  the  subtree 
rooted  to  the  particular  statement.   A  pattern  string 
locator  limits  the  locality  to  be  searched  by  matching  the 
string  against  subparts  of  the  current  locality.   The 
pattern  string  and  matching  operation  in  this  case  are 
somewhat  less  restricted  than  the  production  LHS  pattern 
and  its  respective  matching  operation;  i.e.,  a  string  locator 
may  contain  fragments  of  syntactic  tokens  and  may  match 
only  part  of  a  subtree  successfully.   Consequently,  we 
expect  a  user  of  our  system  to  be  able  to  define  a  locality 
in  his  program  text  about  as  easily  as  he  could  find  a 
portion  of  text  by  means  of  a  general  text  editor. 

Transformations  are  selected  by  supplying  a  name  and 
proper  parameters.   The  exact  format  of  the  commands  to  be 
supported  by  our  system  will  be  discussed  in  the  following 
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subsection  which  deals  with  the  'command  processor'. 

(iv)  COMiMAND  PROCESSOR. 

The  command  processor  (CP)  interfaces  between  a  user 
and  the  program  manipulation  system.   One  of  the  CP ' s 
responsibilities  is  to  validate  system  commands  which  a 
user  can  enter  from  a  teletype.   Once  validated,  a  command 
is  transmitted  to  an  appropriate  utility  for  execution. 
All  I/O  is  handled  by  command  processor  formatting  and 
diagnostic  routines.   Requests  for  user  input  might  originate 
from  a  utility  or  from  the  CP  itself.   Error  diagnostics  and 
informative  messages  are  channelled  through  the  CP . 

The  CP  prompts  the  user  to  enter  input  by   printing  a 
prompt  character   >   at  the  beginning  of  a  line.    All 
commands  begin  with  a  special  character  $  to  distinguish 
commands  from  other  kinds   of  input. 

A  state  of  the  CP  may  be  described  by  two  components, 
a  file  table  FTAB  and  a  current  tree  location  LOC .   FTAB 
maps  file  names  to  tuples  containing  relevant  file  informa- 
tion.  Each  such  file  contains  a  parse  tree  representation 
of  a  SUBSETL  program.   A  special  file  named  PARSEDIN  refers 
to  the  file  currently  being  edited.   LOC  refers  to  the 
current  locality  to  be  searched  in  PARSEDIN.   Initially 
FTAB  =  nullset      and  LOC  =0.   A  transition  from  one  CP  state 
to  another  takes  place  in  accordance  with  the  following 
list  of  user  commands: 
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$PARSE, <FNAME>  -  CP  validates  <FNAME>  as  an  accessible  coded 
file.  If  validation  succeeds,  control  passes  to  the 
parser.  If  parsing  succeeds,  PARSER  outputs  its  parse 
tree  of  FNAME  onto  the  file  named  PARSEDIN  (previous 
contents  of  PARSEDIN  are  destroyed).  FTAB (PARSEDIN) 
must  be  appropriately  set  and  LOG  refers  to  the  root 
node  <program>   in  PARSEDIN. 

$UNPARSE  (FNAME)   -  CP  checks  that  FTAB  is  not  empty. 

If  this  is  true,  it  passes  control  to  the  UNPARSER 
utility  to  print  out  the  source  text  at  the  terminal 
with  statement  numbers  for  the  current  PARSEDIN  file. 
An  optional  file  name  parameter  can  be  used  if  the 
user  wants  the  source  to  be  placed  on  a  file  instead  of 
at  the  terminal.   In  that  case  a  new  entry  for  FNATIE  in 
FTAB  must  be  made. 

$L, <statement#>  -  This  command  defines  a  locality  in  terms 
of  a  particular  statement.   CP  checks  that  LOC  is  not 
zero,  validates  the  number  <statement#> ,  and  if  success- 
ful it  places  a  reference   to  the  subtree  corresponding 
to  <statement#>  in  LOC.   $L,+s  moves  the  locality  up 
or  down  the  tree. 

$P,<pattern>  -  This  command  defines  a  new  locality  within 
the  current  locality  based  on  the  first  successful 
match  between  <pattern>  and  a  sublocality  according  to 
a  depth  first  search  within  the  current  locality.  CP 
must  first  check  that  LOC  is  not  0  and  that  <pattern> 
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is  well  formed  before  the  search  match  begins.   When 
the  first  successful  match  occurs,  LOC  is  set  to  the 
new  node  of  PARSEDIN.   If  no  successful  match  takes 
place   LOC  remains  unchanged. 

$PRINT  -   This  command  unparses  the  current  locality  if 
LOC  i-    0. 

$STOP   -   The  program  manipulation  system  terminates. 

$SAVE,<FNAME>  -  The  PARSEDIN  file  along  with  the  current 
value  of  LOC  is  copied  to  a  file  named  <FNAME>. 
CP  must  determine  that  FTAB  is  not  empty  and  that 
a  file  of  the  name  <FNAME>  does  not  already  exist 
before  performing  its  functions.   If  all  is  successful 
CP  sets  FTAB(FNAriE)  to  appropriate  attribute  values. 

$RESUME,<FNAME>  -  The  file  PARSEDIN  becomes  a  copy  of  the 
parsed  file  named  <FNAME>.   The  old  PARSEDIN  file  is 
destroyed,  while  the  file  <FNAME>  remains  unchanged. 
If  the  functions  described  just  above  are  to  be  perform- 
ed successfully,  FTAB(FNAME)  must  not  be  undefined. 
LOC  is  also  reset.   Note  that  both  the  SAVE  and  RESUME 
functions  provide  a  crude  manual  backtracking  facility. 
A  system  user  will,  thus,  be  able  to  transform  the  same 
program  according  to  different  strategies  at  the  same 
time.   If  one  chain  of  transformations  appears  unfruit- 
ful, he  can  thus  pursue  an  alternative  strategy  by 
RESUMing   a  previously  saved  state  of  the  system. 
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$<TRANSFORMATION  NAI'4E> ,  <PARAriETER  LIST>  -  This  command  serves 
to  select,  validate,  and  perform  a  transformation  on 
the  PARSEDIN  file.   CP  must  validate  that  PARSEDIN  exists 
by  checking  that  FTAB  7^  nullset.       Then  < TRANSFORMATION 
NAME>   must  match  the  name  entry  of  a  transformation 
description  in  TRANSLIB.   The  parameter  list  syntax 
must  conform  to  the  format  part  of  the  name  entry. 
If  validation  succeeds,  CP  passes  a  reference  to  the 
entry  in  the  TRANSLIB  and  the  parameter  list  to  the 
transformation  generators  for  execution. 

The  above  command   set  is  an  initial  collection  of 
functions  to  be  implemented  in  our  first  system. 

The  following  steps  are  necessary  for  system  startup: 

1.  SUBSETL   source  text  files  are  produced  externally. 

2.  Source  text  files,  TRANSLIB,  and  the  program  manipulation 
system  (PMS)  execute  module   are  requested  for  a  run. 

3.  When  the  PMS  module  is  first  executed,  control  passes 
to  the  CP  routine  which  prompts  the  user  to  enter  a  command. 

4.  If  FIRSTFILE  is  the  first  file  that  a  user  wants  to 
transform,  then  the  command  $PARSE, FIRSTFILE  is  entered. 
If  FIRSTFILE  is  successfully  parsed,  transformations  can 
then  be  performed.   Otherwise,  the  user  must  select  another 
source  file  for  parsing  or  terminate,  edit  FIRSTFILE,  and 
try  again. 

Let  us  consider  the  following  topological  sort  text 
as  a  first  example  to  illustrate  system  functioning: 
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/*  sp  is  a  predecessor  relation  which  defines  a  partial 
order  on  s  */ 

t  -    null  tuple; 

(while  3x  e  s  I  (sp{x}  *s)=  nullset) 

t  =  t  +  [X]; 

s  =  s  -  { X }  ; 

end  while;       /*  t  will  be  the  total  order  */ 

which  is  contained  in  a  coded  file  named  TOPSORT.   For  this 
example,  we  will  use  two  transformations  from  TRANSLIB, 
3  FORMAT  and  SETEQNL  (Appendix  D,  IX   describes  the  collection 
of  transformations  that  we  assume  will  be  available) ,  to 
transform  the  existential  quantifier  of  the  topological 
sort  above  into  a  more  convenient  form. 

During  step  3  of  system  startup   the  command 
processor  will  first  prompt  ('>'  character  is  used)   the 
user  to  enter  a  command.   The  steps  below  describe  a  user's 
interaction  with  the  system: 

user  command  system  response 

$PARSE, TOPSORT   TOPSORT  is  parsed  and  stored  on  PARSEDIN. 

LOG  is  set  to  the  root  node.  The  system 
then  prompts  the  user  (a  prompt  will  always 
be  generated  after  the  system  completes  each 
task) . 

$UNPARSE         PARSEDIN  is  printed  at  the  user's  terminal 

with  statement  numbers  and  textual  structuring 
as  follows: 

1  t  =  nulltuple ; 

2  (while  3x  G  s  I  (sp{x}  *  s)  =  nullset) 

3  t  =  t  +   [X  ]  ; 

4  s=s-{x}; 
end   while; 
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$L,2       LOG  is  set  to  the  statement  node  corresponding  to 
the  while  loop.   A  new  locality  is  defined. 

$P,  (X  =  Y)  Starting  from  the  current  locality,  a  depth  first 
search  matches  the  pattern  (X  =  Y)  with  the 
expression  (sp{x} *s)  =  nullset.      LOG  is  reset  to 
the  new  node . 

$SETEQNL   The  transformation  generator  will  immediately  match 
the  LHS  pattern  of  the  SETEQNL  rule  to  the  current 
locality.   Gonsequently ,  (sp{x}  *  s)  =  nullset 
is  replaced  by  {[+:  u  6  (sp{x}  *  s)]l)  =  0  and 
LOG  remains  unchanged. 

$L,2       LOG  is  set  back  to  the  statement  corresponding  to 
the  while  loop. 

$3F0RMAT    Starting  with  the  current  locality,  a  depth  first 
search  will  eventually  stop  with  a  match  between 
the  LHS   pattern  of  the  3  FORMAT  rule  and  the 
existential  quantifier   expression  in  PARSEDIN. 
After  the  replacement  takes  place,  LOG  will  be 
set  to  the  current  node  (for  the  3  quantifier). 

$UNPARSE    The  following  PARSEDIN  text  is  printed  at  the 
user's  terminal: 

1  t  =  nulltuple ', 

2  (while  3xe{v€s|  (  [+:ue(sp{v}*s)  ]1)  =  0}) 

3  t  =  t  +  [X]; 

4  s=s-{x}; 
END    WHILE', 

$SAVE,T0P1  A  new  2  record  file,  TOPI   is  created  with  the 

contents  of  the  current  value  of  LOG  as  the  first 

record  and  a  copy  of  PARSEDIN  as  the  second. 
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Next,  suppose  that  T0P2  is  a  saved  file  containing 
the  following  parsed  code  (representing  a  lower  level  version 
of  a  topological  sorting  algorithm) : 

1  t  =  nulltuple; 

2  (Vx  e  s) 

3  COUNT(x)  :=  [+:  y  e  (sp{x}  *  s)]l; 
END  V; 

4  (Vx  e  s) 

5  succ(x)  :=  {y  G  s  |  x  G  sp{y}}; 
END  V; 

6  ZRCOUNT  =  {x  G  s  |  COUNT (x)  =  0}; 

7  (while  3x  G  ZRCOUNT) 

8  t  :=  t  +  [x]  ; 

9  (Vy  G  succ (x) ) 

10  Q:={zGs|z=y}; 

11  ZRCOUNT  :=  ZRCOUNT- {z  G  Q | COUNT (z)  =  0}; 

12  COUNT (y)  :=  COUNT (y)  -  1; 

13  ZRCOUNT  :=  ZRCOUNT  +  {z  G  Q | COUNT (z)  =  0}; 
END  V; 

14  ZRCOUNT  =  ZRCOUNT  -  {x}; 

15  s=s-{x}; 
END  WHILE 

The  TRANSLIB  entries   listed  in  Appendix  D,  IX  include 
cleanup   transformations  which  can  be  used  to  simplify  the 
source  code  above . 
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What  follows  below  is  a  scenario  of  a  user's  inter- 
action with  the  transformation  system  applied  to  the  source 
for  T0P2. 


User  Command 
$RESUME,  TOP 2 


$L,10 
${T0* 
$  *SIMP 


$L,10 

$VSUBST,  SC0PE=11 

$VSUBST,  SC0PE=13 


$L,11 
${=NL 


$L,11 

$IDEM 

$USELESS 

$L,13 

${TOIF 


$L,13 
$DISTIF1 


System  Response 

The  current  PARSEDIN  is  replaced  by 

a  copy  of  T0P2  and  LOG  is  restored. 

{zSs|z=y}  is  replaced  by   {y}*S; 
system  prints  s  J/l/CS  {y}  at  the 
terminal  and  replaces  {y}*s  by  {y}. 

Occurrences  of  Q  in  statements  11  and 
13  are  replaced  by  {y}  after  the 
enabling  condition  is  validated. 

System  prints  Vz  e  {y } |nc0UNT ( z)  =  0 
and  replaces  the  setformer  of  state- 
ment 11  by  nutlset . 

ZRCOUNT:=ZRCOUNT-NULLSET  becomes 
ZRCOUNT:=ZRCOUNT  .  Statements  11  is 
removed . 

IF   COUNT(y)=0  then    {y}  else    nullset 
replaces  {ze{y} | COUNT (z)  =  0} 

After  validating  the  enabling  condi- 
tions the  IF  expression  at  line  13  is 
distributed  into  a  conditional  state- 
ment 
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$IDEM 
$L,-1 


$USELESS 

$L,13 

$EMPTYELSE 


$DEADELIM, SC0PE=7 


$L,5 
${to  V 


$L,4 
$VCONC 


$VCOMMUTE 

$VSIMP* 

$*COMMUTE 


IF      COUNT (y)=0  then   ZRCOUNT: =ZRCOUNT+{y } ; 
eZ.se  ZRCOUNT  :=  ZRCOUNT  +  nullset; 
end    if; 

Locality  is  moved  up  to  the  statement, 
ZRCOUNT  :=  ZRCOUNT; 
The  useless  statement  is  removed. 
Statement  13  becomes 

IF   COUNT  (y)  =  0  then   ZRCOUNT :  =Z?.COUNT+{y }  ; 
The  dead  code  elimination  routine  will 
eliminate  statement  10,  Q  :=  (y)  as  well 
as  the  last  assignment  statement, 
s  :  =  s  -  { X }  ; 

succ(x)  :=  {y  e  s|x  e  sp{y}}; 

is  transformed  to  the  following  loop: 

(Vy  e  s|x  e  sp{y})  succ (x) :=succ (x) +{y} ; 
END  V; 

Viterators   at  statements  4  and  5  are 
now  trivially  combined  into  the  form 

(Vx  e  s,  y  e  s|x  g  sp{y}) 

succ(x)  :=  succ (x)+{y} ; 

END  V; 

These  three  commands  simplify  the  loop 

at  statement  4  to  an  equivalent  loop 

(Vy  e  s,  X  e  sp{y}) 

succ(x)  :=  succ(x)+{y}; 

end  V  ; 
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$L,4 
$VBRKUP 


$L,2 

$JAM 


$UNPARSE,TOP 


$STOP 


System  prepares  loop  at  4  for  jam- 
ming with  the  loop  at  2. 

The  system  makes  sure  that  the 
jammed  blocks  are   disjoint  and 
carries  out  the  transformation. 
The  system  places  source  text  for 
the  current  PARSEDIN  reflecting  all 
the  transformations  applied  above 
on  a  file  named  TOP.  We  state  this 
simplified  version  of  the  topologi- 
cal sort  just  below: 


t  =  nut  tuple', 

(VW  G  S) 

COUNT(w)  :=  [+:  y  e  (sp{w}  *  s)]l; 
(Vx  e  sp{w}) 

succ(x)  :-   succ(x)  +  {w}; 
END    V; 
END  V; 

ZRCOUNT  :=  {x  S  s| COUNT (x)  =  0 } ; 
(while  3x  e  ZRCOUNT) 
t  =  t  +  [x]  ; 
( Vy  e  succ (x) ) 

COUNT (y)  :=  COUNT (y)-l; 

IF   COUNT (y)  =  0  then    ZRCOUNT  :=  ZRCOUNT 

END  IF; 

END    V; 

ZRCOUNT  :=  ZRCOUNT  -  {x}; 

END   while; 
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-  (y); 


We  have  just  presented  external  design  specifications 
for  a  source  to  source  transformational  implementation  includ- 
ing an  extensive  collection  of  high  level  transformations 
required  to  accommodate  formal  differentiation  which  we 
discuss  separately  in  the  next  section. 
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C.    Computing  the  Formal  Derivative 
1.    Introduction. 

The  major  implementation  problems  which  we  need  to 
consider  center  on  the  construction  of  algorithms  to 
automate  formal  differentiation  (which  we  abbreviate 
hereafter  as  FD) .   Since  FD  is  a  basic  optimization 
technique   applicable  to  programs  written  in  a  variety 
of  languages,  it  is  useful  to  describe  a  language  inde- 
pendent methodology  from  which  to  derive  FD  implementa- 
tions for  particular  programming  languages.   We  will 
develop  such  a  methodology  in  this  section  for  handling 
expressions  continuous  in  all  of  their  parameters,  and 
we  derive  algorithms  for  implementing  FD  in  the  contexts 
of  FORTRAN  and  SETL. 

Aside  from  providing  a  unified  approach  to  implement- 
ing FD  in  widely  varying  languages,  the  technique  described 
here  will  not  be  bound  by  assumptions  which  limit  its 
usefulness  to  expressions  continuous  in  all  parameters,  and 
Chapter  4  will  describe  extensions  to  our  framework  so 
that  it  can  aid  in  the  development  of  FD  algorithms  for 
handling  more  general  expressions. 

We  recall  from  Chapter  1  that  a  method  for  implementing 
FD  must  perform  two  principal  functions: 

1.  Find  reduction  condidate   expressions;   and 

2.  Perform  the  FD  transformation  on  some  of  these 
candidates . 
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We  will  describe  two  strategies  for  handling  these 
tasks.   The  first  strategy  is  a  completely  automatic 
approach  that  reduces  all  reduction  candidate  expressions. 
This  approach  relates  to  FORTRAN  level  reduction  in 
strength  algorithms  found  in  [Al ,C1 ,C2 ,K1 , Schl ] .   In  the 
second  approach,  reduction  candidates  are  determined  auto- 
matically, but  selection  of  candidates  for  differentiation 
is  done  interactively  from  a  terminal. 

We  propose  this  second  approach  for  an  initial  SETL 
level  FD  implementation  design  to  be  integrated  with  the 
transformational  system  described  in  the  preceding  sections 
of  this  chapter.   In  this  section  we  shall  describe  an  FD 
design  for  SETL  which  is  capable  of  applying  many  of  the 
transformations  discussed  in  Chapter  II,  C. 

An  important  human  factors  goal  of  our  design  is  to 
minimize  and  localize  the  changes  made  in  the  source  code 
due  to  application  of  our  transformations.   In  particular, 
wholesale  introduction  of  temporary  variables  to  hold 
subexpression   values,  which  is  allowable  for  optimizers 
which  only  transform  an  intermediate  text,  can  easily  make 
source  level  code  unreadable.   The  algorithms  we  use  to 
implement  FD  are  sensitive  to  this  concern. 
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2.    Automatic  Approach 

FD  is  a  kind  of  program  loop  optimization  which 
generalizes  code  motion  and  which  works  on  a  single  program 
region  at  a  time.   The  program  regions  we  will  use  for  FD 
are  the  'natural'  loops  (described  in  AUl)  which  are 
defined  uniquely  by  the  dominator  relation  on  a  flow 
graph  G  and  by  the  set  of  back  edges  in  G.   These  loops 
partition  G  into  single  entry  strongly  connected  regions. 
For  each  such  loop  L  we  insert  a  new  prologue  node  at  a 
place  p   in  the  parse  tree  T  determined  by  the  following 
conditions : 
i.    p  ^  L; 

ii.   Control  flow  must  pass  through  p  before  entering  L; 
iii.  The  single  statement  succeeding  p  in  the  control  flow 

graph  on  T  is  the  entry  to  L. 
All  code  pushed  out  of  L  by  FD  will  be  moved  to  the  prologue, 

To  handle  the  two  tasks  of  finding  and  reducing  differ- 
entiable  expressions,  we  proceed  roughly  as  follows: 

1.  First  we  find  an  initial  set  Cands„   of  dif ferentiable 
expressions  and  let  i  :=  0. 

2.  Then  we  iterate  steps  3  and  4  until    Cands .  =  0. 

3.  Remove  a  candidate  expression  from   Cands.  and 
differentiate  it. 

4.  Let  i  :=  i+1  and  include  in   Cands.   expressions 
found  in   Cands. _,   plus  any  new  reduction  candidates 
which  may  result  from  step  3. 


2.1   Finding  Reduction  Candidates 

Our  automatic  approach  to  FD  demands  that  all  differ- 
entiable   expressions  in  L  should  be  reduced  in  an  order 
consistent  with  the  following  rule:   an  expression  e  cannot 
be  reduced  until  all  dif f erentiable  subexpressions  of  e  are 
first  reduced.   Thus,  the  initial  set  Cands.   of  reduction 
candidates  will  include  only  those  dif ferentiable  expres- 
sions which  do  not  have  reducible  subexpressions.  Our  method 
of  constructing  this  set  looks  for  all  expressions  e  in  L 
matched  by  elementary  expression  forms  found  in  a  collection 
F  of  such  forms,  and  that  also  depend   only  on  particular 
combinations  of  region  constants  and  'induction'  variables. 

Each  elementary  form  f(x, ,...,x  )  is  a  pattern  tree 
involving  pattern  variables  x, ,...,x   and  literal  symbols. 
We  represent  a  pattern  tree  by  a  set  N  of  nodes  (implemented 
as  blank  atoms)  and  a  map  Psucc   associating   each  node  n 
with  a  tuple   Psucc (n)   of  successor  nodes.   We  also  make 
use  of  a  map  Plabel   which  is  partially  defined  on  N  and 
which  associates  a  node  n  with  either  the  name  of  a  pattern 
variable   or  a  literal  value  (e.g.,  a  constant  or  operator 
symbol) .   Patterns  are  used  to  match  subtrees  representing 
reducible  expressions  within  the  parse  tree  for  the  loop  L. 
In  SETL  our  implementation  for  parse  trees  is  similar  to 
our  pattern  tree  implementation;   i.e.,  Tsucc   is  the 
corresponding  successor  map  in  a  parse  tree  and  Label  associ- 
ates  a  parse  tree  node  n  with  a  literal  value   such  as 
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variable  name,  constant,  or  other  token  value  (cf.  Appendix 
E(ii)  for  more  details  on  pattern  and  parse  tree  represen- 
tation) . 

We  say  that  a  basic  form  f  matches  a  parsed  expression 
rooted  in  r  if  the  tree  structures  of  f  and  r  match  down  to 
the   Leaves  of  f  and  if  the  literals  of  f  and  r  match  iden- 
tically.  In  SETL  this  is  expressed  more  simply  by  the  test 
Match (r, f,Pfunc)   where  match   is  the  boolean  valued  SETL 
function  given  in  Appendix  E(ii)   and  described  in  Section  B 
of  the  present  chapter.   We  let  Text(r)  denote  the  text 
expression  for  the  parse  tree  r,  and  say  that  Text(r)  is  a 
potential  reduction  candidate  expression  if  the  predicate 
3f  e  F |Match (r, f ,Pfunc)   holds.   Finally,  we  note  that  when 
Match  succeeds,  its  parameter  Pfunc  will  be  defined  as  a 
map  associating  each  pattern  variable  x  of  f  with  the  root 
of  a  matched  subtree  Pfunc (x)  of  r.   If  we  abbreviate 
Text (Pfunc (x) )   by  x,  then  the  initial  set   Cands_   of 
elementary    reduction  candidate  expressions  consists  of  all 
expressions  Text(r),  r  s  l,  satisfying  the  following  two 
conditions: 

1.  3f  e  F  I  Match (r,f, Pfunc) 

2.  If  X, ,...,x   are  all  the  pattern  variables  of  f  then 

In 

for  i  =  l,...,n,  X.  is  either  a  region  constant  expres- 
sion or  an  induction  variable. 

We  categorize  induction  variables  according  to  the 
kinds  of  modifications  they  undergo  in  L.  By  constructing 
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such  categories  appropriately,  we  can  decide  whether  within 
L  an  expression  is  continuous  relative  to  all  modifications 
to  variables  on  which  it  depends.   To  see  how  this  is  done, 

it  is  convenient  to  regard  each  elementary  form  f{x^,...,x    ) 

^  In 

in  F  as  an  elementary  expression  E  =  f (x, ,...,x  )  in  the 

variables   x,  ,...,x  .   We  can  then  specify  the  kinds  of 
In 

modifications  to  x,,...,x   relative  to  which  f  varies  contin- 

1      n 

uously  and  can  describe  associated  update  corrections  to  E 
by  entries  in  a  table  of  derivatives  D.   More  specifically, 
if  E  is  continuous  with  respect  to  a  change  x  :=  g(y,,...,y^) 
and  if  the  pre  and  post  derivatives  to  E  are 

E  :-      preD(y, , . . . ,y  ,x, , . . . ,x  ,E)  and 

"^    -^1      -'ml      n 

E  :=   postD (y^  , . . . ,y  ,x, ,...,x  ,E),   then  the  set 
'^  -'I      -'ml       n 

D(x  ,f)  will  contain  the  triple 

(1)  [Xj  :=  g(y^,.  .  .  ,yj^)  ,  E  :=    preD(y^,  .  .  .  ,yj^,x^,  .  .  .  ,x^,E)  , 

E  :=   postD(y  , . . . ,y  ,x  ,...,x  ,E)] 

Once  the  set  D  of  triples  (1)  is  formed  we  can  regard 
the  three  components  of  (1)  as  patterns  for  use  in  con- 
structing induction  variables  and  also  in  generating  deriva- 
tive code.   Also,  we  can  use  our  tree  pattern  matching 
routine  Match  (given  in  Appendix  E)  to  match  actual  modifica- 
tions in  variables  within  L  by  parameter  definition  patterns 
stored  as  the  first  component  of  triples  (1)  contained  in  D. 
Additionally,  we  ease  the  task  of  recognizing  redefinitions 
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to  variables  on  which  reduction  candidates  depend  by  assum- 
ing that  variable  types  are  made  available  at  compile  time 
and  that  all  occurrences  of  a  variable  of  the  same  name 
are  bound  to  the  same  data  structure. 

In  general,  we  must  allow  a  different  set  of  induction 
variables  for  every  component  of  every  elementary  expres- 
sion in  F.   Given  a  variable  v  found  in  L  and  an  elementary 
form  f(x, ,..-,x  ),  V  belongs  in  the  i-th  induction  set  for  f, 
which  we  denote  IV(x.,f),  iff  the  following  two  conditions 
hold: 

1.  All  definitions  of  v  in  the  loop  L  in  which  FD  is 
applied  match  parameter  definition  patterns  in  D(x.,f). 

2.  For  each  such  definition  pattern  the  corresponding 
derivatives  must  consist  of  easy  calculations  relative 
tof . 

We  indicate  costly  subparts  of  derivatives  by  underlining 
them  in  the  D  tables  of  Appendix  C,  and  require  that  such 
subparts  be  reducible.   Thus,  each  subpart  must  match  an 
elementary  expression  g  found  in  F,  and  must  depend  on 
induction  variables  of  g. 

A  procedure  to  find  induction  variables  satisfying  the 
two  conditions  above  can  be  based  largely  on  the  F  and  D 
tables.   Of  course,  the  F  and  D  tables  must  reflect  some 
appropriate,  even  if  loose,  idea  of  the  relative  cost  of 
operations.   This  informal  measure  of  cost  guides  us  in 
determining  what  elementary  expressions  to  include  in  F, 
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what  formal  derivative  rules  to  place  in  D,  and  also  what 
subexpressions  of  these  derivatives  must  also  be  reduced  in 
order  to  make  FD  profitable. 

Before  describing  our  fully  automatic  reduction  algo- 
rithm, we  note  that  condition  2   above,  which  governs  the 
construction  of  sets  of  induction  variables,  involves  a 
recursive  step  taken  whenever  one  application  of  an  FD  trans- 
formation leads  to  a  chain  of  successive  transformations.  To 
prevent  the  possibility  of  infinite  chains  of  steps  of  this 
sort,  we  admit  only  a  bounded  number  of  new  dif ferentiable 
expressions  introduced  as  part  of  derivative  code.   In  this 
connection  we  use  the  following  general  heuristic:   never 
reduce  an  expression  which  has  already  been  reduced. 

2.2   Reduction  Algorithm 

We  shall  now  give  additional  details  concerning  the 
procedures  used  to  detect  induction  variables  and  reduction 
candidate  expressions.   These  procedures  employ  patterns 
stored  in  our  tables  F  and  D  to  match  expressions  and 
statements  in  a  loop  L,   Derivative  patterns  stored  in  the 
D  table  will  be  used  as  macros  which  generate  actual  deriva- 
tive code  which  is  inserted   into  L.   Expansion  of  these 
syntax  macros  can  involve  simple  text  substitution   in  which 
pattern  variables  function  as  substitutable   parameters. 
Together,  substitution  and  matching  allow  us  to  handle  a 
general  family  of  dif ferentiable  expressions  formed  by 
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composition  from  elementary  expressions. 

In  SETL,  the  function  subprogram  Expand (f ,Pfunc) 

(cf.,  Appendix  E(iii))  implements  macro  expansion.   The 

two  parameters  of  Expand  are  a  pattern  tree  f  and  a  map 

Pfunc  which  associates  each  pattern  variables  x  of  f  with 

a  parse  tree  Pfunc(x).   Expand (f , Pfunc)  will  return  the 

root  of  a  new  parse  tree  which  results  from  replacing  each 

pattern  variable  x  of  f  by  Pfunc (x) . 

To  facilitate  our  discussion  of  the  pattern  matching 

and  macro  expression   mechanisms  used  by  the  FD  algorithms 

to  be  presented,  it  is  convenient  to  make  use  or  a  few 

additional  notational  devices.   If  a  pattern  f  matches  a 

tree  t  so  that  Match (t ,f ,Pfunc )  holds,  then  we  use  the 

symbol  f  as  an  abbreviation  for  Text(t).   Note  that  the 

parameter  Pfunc  will  be  defined  after  successful  matching, 

and  that  subsequent  execution  of  Expand (f , Pfunc)  will 

produce  a  copy  of  the  tree  t  originally  matched  by  f.  We 

will   sometimes  use  the  term  f(x, ,...,x  )  to  denote  the 

1      n 

pattern  f  along  with  all  of  its  pattern  variables  x, ,...,x  , 
In  this  case,  we  use  the  term  f(x, ,...,x  )  as  an  abbrevi- 
ation for  Text(t),  given  that  Match (t , f (x  ,..., x  ), Pfunc) 

holds  and  that  x.  =  Pfunc (x.),   i  =  l,...,n.  If  for 

1  1 

i  =  l,...,n   t.  is  a  tree  and  y.  =  Text(t.),  then  we  also 
use  the  term  f(y-|^,.,.,y  )  to  express  the  same  thing  as 
Text (Expand (f,Pfunc) ) ,  where  Pfunc (x.)  =  t.  ,  i=l,...,n. 

The  notation  just  described  allows  us  to  describe  FD 
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transformations  at  both  implementation  and  abstract  levels. 
Our  implementation  requires  all  transformations  to  manipu- 
late the  parse  tree  form  of  source  code.   However,  for 
clarity  we  will  often  prefer  to  discuss  FD  and  other 
transformations  more  abstractly  in  terms  of  changes  in 
source  code  independently  of  the  underlying  parse  tree. 

We  will  make  use  of  the  preceding  notation  to  sketch 
the  logic  of  the  automatic  FD  procedure   given  below.  An 
important  characteristic  of  this  procedure   is  that  it 
only  reduces  elementary  dif ferentiable  expressions.  How- 
ever, nonelementary  dif ferentiable  expressions  become 
elementary  after  reduction  of  all  of  their  subexpressions. 
Thus,  all  dif ferentiable  expressions  in  L  will  eventually 
be  reduced. 

Algorithm  1-2. 

Input:   a  derivative  table  D,  a  set  of  elementary  forms  F, 
a  parse  tree  L  of  the  optimization  loop,  and  a  map 
Defs  which  associates  each  variable  name  v  occurring 
in  L  with  the  set  Defs (v)  of  nodes  in  L  corresponding 
to  statements  which   can  modify  the  value  of  v. 

Output:   a  new  optimized  loop  L'  and  its  prologue  code  block, 

1.  Find  the  set  RC   of  nodes  in  L  corresponding  to  region 
constant  expressions  of  L. 

2.  Compute  initial  sets  iv(x,f)of  induction  variables  for 
every  elementary  form  f  e  F  and  each  pattern  variable 
x  of  f . 
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3.  Initialize  Prologue  to  an  empty  code  block. 

4.  While  3a  node  t  s  l  and  an  elementary  form 

f(x, ,...,x  )  e  F  such  that 
In 

(1)  Match (t,f,Pfunc)  and 

(2)  for  i  =  l,...,n   either  Pfunc(x.)  e  RC  or   x.  GIV(x.,f) 
perform  steps  5,  6  and  7. 

5.  Generate  a  unique  variable  v_  for  keeping  the  matched 

f 
expression  f  available  in  L,  and  insert  an  assignment 

V  =  f  at  the  end  of  the  Prologue  block. 
f 

6.  For  each  expression  x.  ,  i  =l,...,n   such  that 

x.  e  iv(x.,f),  and  for  each  program  point  p  e  Def s (x) 
at  which  x  undergoes  a  change  x  =  A_  ,  insert 

X 

appropriate  derivative  code  which  keeps   v_  avail- 

f 
able  in  L.   This  derivative  code  can   be  generated 

by  first  finding  the  unique  triple  [mod,preD ,postD] 

belonging  to  D(x.,f)  in  which  Match (p, mod, Qfunc)  holds. 

Next,  to  prepare  for  macro  expansion  we  must  produce 

a  new  pattern  variable  map  Sfunc  (which  can  be  formed 

from  Pfunc  and  Qfunc)   which  maps  pattern  variables 

found  in  preD  and  postD    into  appropriate  trees. 

Finally,  we  expand  the  pre  and  post  derivative  patterns 

preD   and  postD   by  executing  Expand (preD , Sfunc)  and 

Expand (postD, Sfunc) ,  and  insert  the  resulting  code 

immediately  before  and  after  p. 

7.  Within  L  replace  all  occurrences  of  f  by  v  .  Also, 

f 
within  the  derivative  code  generated  in  step  6  substi- 
tute the  variable  v   for  any  expression  e  which  has 
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already  been  reduced.   Finally,  make  appropriate 
additions  to  the  set  RC  of  region  constants  and  to 
the  induction  sets  IV. 

Steps  1-5  and  7  above  are  fairly  straightforward, 

but  step  6  requires  further  explanation.   We  note,  first 

of  all,  that  as  a  consequence  of  the  way  our  algorithm 

chooses  expressions  to  reduce,  each  expression  chosen  will 

be  elementary;  i.e.,  if  an  expression  pattern  f(x, ,...,x  ) 

arising  in  step  4   is  the  elementary  form  in  F  matching 

the  subtree  t  in  L,  we  will  know  that  for  i  =  l,...,n,  x. 

is  either  an  induction  variable  belonging  to  IV(x.  ,f)  ,  or  a 

region  constant  expression.   Hence,  in  computing  the  formal 

derivative  of   v   =  f(x, ,...,x  )   relative  to  the  change 

:p      1       n 

in  a  variable   x.  ,  we  only  need  to  consider  two  cases. 

1.    In  the  simplest  case   the  variable   x.   occurs 

only  once  in  f .   In  this  case,  at  each  point  p  within  L 

where  x.  undergoes  a  change   x.  =  A_  ,   we  compute  pre  and 

X  . 
post  derivatives  for  v_  by  matching  p  to  a  parameter  change 

f 
pattern  x.  =  g(x-,,...,x  ,  .  .  .  ,x  )  in  D(x.,f)  where  x-j^,...,x^ 

must  match  the  same  objects   x,,...,x   in  L  as  are  matched 

by  the  corresponding  pattern  variables  in  f{x^,...,x  ). 

Remaining  pattern  variables  x  ^, ,...,x  may  match  arbitrary 
-"    ^  n+l      m 

subtrees  of  p.   Recall  here  that  the  matched  parameter 
change  pattern  is  the  first  component  of  a  triple 
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[x.  =  g(x^, . . . ,x^) ,preD (x^, . . . ,x^,E) ,postD(x^ , . . . ,x^,E) ] 
which  we  use  to  determine  the  derivative  code  relative  to 
the  change  x.  =  A_  .   Specifically,  we  insert  code  produced 
by  expansion  (involving  simple  text  substitution) 


(2)   preD (x, , . . . ,x  ,v  )   and   postD (x, , . . . , x  ,v  )  , 
'^     1      m   £        '^  1      m   T 


immediately  before  and  after  p. 

Note  also  that  the  actual  expansion  implied  by  (2)  can  be 

empty. 

2.    A  second  more  complicated   case  arises   when  more 
than  one  pattern  variable  of   f(x^,...,x  )  matches  the 
same  variable  of   f.   Consider  an  expression 


(3)  v_  =  f (x, , . . ,x,x.  , , . . . ,x  ) 


matched  by  f (x^  , . . . , x  )   in  which  for  i  =  1 ,  .  .  .  ,  j  , 

X  s  IV (x . , f ) .   Suppose  also  that  for  i  =  j+l,...,n, 

either   x.  g  IV(x.,f)  and  x.  is  different  from  x  or  x.  is 
111  1 

a  region  constant  expression.   Then  when  v_  is  available 

f 

just  prior  to  a  definition  x  =  A    which  spoils  v_  ,  we 

^  f 

can  keep  v_  available  after  this  change  by  executing  the 

f 
following  derivative  code: 
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(4)   x„^  p^=  X  /*  copy  the  old  value  of  x  */ 

X   =  A  /*  A   is  the  change  in  x  within  L  */ 

X  X  ^ 


P^^^l  (^OLD ^OLD'^j+1 ^m'^^) 


postD^(x,XQ^j-|,  .  .  .  /^oLD'^j+l'  •  •  •  '^m'^ 


f 


P^^Dj  (X ^'^oLD'^j+1 ' • • • '^m'^^^ 


postD  (x,  .  .  .  ,x,x.^j^,  .  .  .  ,x^,v_ 


f 


where  for  i  =  l,...,j   the  macros   preD.,  postD.  are  the 

second  and  third  components  of  a  unique  triple  (contained 

in  D(x.,f))   whose  first  component  is  a  pattern  matching 

the  assignment  x  =  A  . 

In  deriving  (4)   we  use  the  formal  device   of  replacing 

the  j  occurrences  of  x  in  (3)  by  uniquely  renamed  new 

variables   x-,,...,x    all  having  the  same  value  of  x  just 
1      m 

before  the  change  x  =  A   ,  and  modified  by 
(5)  x^  =  A_ 

X.  =  A 
^      X. 

: 

just  afterwards .   This  calls  for  j  applications  of  the 
case  1  differentiation  rule  to  the  expression  f(x, ,...,x  ) 
formed  from  (3)  by  substitution.   Observation  shows 
that  for  each  associated   pre  and  post  derivative  code 


fragment  preD .  and   postD.  ,  i  =  l,...,j,  all  occurrences 

of  X, ,...,x.  ,  in  pre  D.  and  occurrences  of  x, ,...,x.  in 
1       1-1     '^     1  11 
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post  D.   will  have  the  same  value  as  the  changed  value  of  x; 
these  occurrences  can  be  replaced  by  occurrences  of  the 
variable  x.   VJe  also  note  that  all  occurrences  of 


X ,  X  .  in  preD .  and  occurrences   of  x .,,,..., x .  in 

1      ]         1  1+1       J 


postD.   have  the  same  value  as  the  initial  value  of  x; 
such  occurrences  may  therefore  be  replaced  by  occurrences 
of  the  variable  x-,.  _,  ,  i.e.,  by  a  copy  of  the  initial  value 
of  X.   The  code  sequence  (4)  then  results  by  elimination  of 
all  dead  assignments  to  the  renamed  variables  x, ,...,x. . 

Note  finally  that  the  copy  operation  x    =  x  can 
sometimes  be  eliminated  profitably  by  using  the  following 
approach:   Find  the  smallest  number  L  between  1  and  j+1 


such  that  none  of  the  code  fragments   preD.   and  postD. 

i  =  L,...,j  in  (4)  refer  to  x_  ^  ^^  •   Replace  all  occurrences 


of  X     and  x  in  preD.  and  post  D.  ,  i  =  1,...,L-1  by  x  and 
OJ_iD  1  1 


\      respectively.   Replace  all  occurrences  of  x  in  preD.  and 


postD.  ,   i  =  L, . . . , j  by  X.   These  substitutions  allow  us 
to  replace  (4)  by  the  following  code: 


(6)   preD^  (x,  .  .  .  ,  x,  x  ._|_^  ,  .  .  .  ,  x^,v_) 

postD^  (A^,x,  .  .  .  ,x,x.^-|^,  .  .  .  ,x^rV_) 


preD^_^  (A^,  .  .  .  ,A^,x,  .  .  .  ,x,x._^^,  .  .  .  ,x^,v_) 


f 


L-2 


postD    (A  ,...,A  ,x,...,x,x 


L-1  '  X 


X 


j+1 ^m'^.^ 


f 


L-1 

X  =  A 


X 
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preDj^  (x,  .  .  .  ,x,x.^^,  .  .  .  ,x^,  v_ 


postD^(x,  .  .  .  ,x,x  ._|^-|^  ,  .  .  .  ,x^^^,v_) 


f 
"f 


preD  .  (x, . , .  ,x,x .  ,  , . . . ,x  ,v_) 
J  J ''-'-       '"  f 

postD . (x, ... ,x,x .,,... ,x  ,v_) 

D  3"'"-'-       ^   f 


Since  the  code  (6)  is  defined  by  the  way  in  which 
we  order  occurrences  of  x  in  (3) ,  we  can  change  the 
ordering  (5)  so  as  to  generate  a  minimal  number  of  calcu- 
lations A  .   When  such  calculations  A   can  be  eliminated 

X  X 

entirely  or  when  they  can  be  effectively  eliminated  by 
means  of  cleanup  transformations  (cf.  Appendix  D)  then  (6) 
will  usually  represent  an  improvement  over  (4).   Note  also 
that  copying  of  nonelementary  variables  (e.g.  array, 
structure  or  set  valued  variables)   is  likely  to  be  expen- 
sive, so  that  an  improved  version  of  the  update  code  (6) 
may  be  a  necessary  precondition  for  deciding  to  reduce  (3) 
at  all. 
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2.3   Automatic  FD  for  FORTRAN  and  SETL 

We  can  use  the  preceding  framework  to  derive  an  FD 
implementation  design  for  a  given  programming  language  Q 
by  constructing  F  and  D  tables ,  by  giving  a  procedure 
for  finding  induction  variables,  and  by  bounding  the 
number  of  expressions  reduced  by  Algorithm  1-2  for  Q. 

As  a  first  example  of  this  observation,  consider  the 
case  of  FORTRAN.   Our  goal  in  FORTRAN  level  FD  is  to  reduce 
costly  exponentiations  to  less  costly  multiplications,  and 
to  replace  division  and  multiplication  operations  by 
inexpensive  additions  and  subtractions.   The  Fortran  F 
and  D  tables  shown  in  Appendix  C,  i  reflect  this  aim  and 
also  reflect  our  assumptions  about  the  relative  cost  of 
these  operations.   Note  that  subexpressions  underlined  in 
the  D  table  represent  costly  subparts  of  derivatives  which 
must  be  further  reduced  to  make  these  derivatives  profit- 
able . 

If  we  examine  the  F  and  D  tables  for  FORTRAN  closely, 
we  can  see  that  use  of  a  separate  induction  variable 
procedure  for  each  parameter   of  every  elementary 
expression  is  easily  avoided.   The  Fortran  table  F  contains 
three  elementary  forms,  but  the  basic  parameter  definition 
patterns  contained  in  D  for  each  of  these  forms  are  the 
same.   To  decide  whether  a  variable  v  fits  these  basic 
definition  patterns  we  can  simply  evaluate  the  predicate 
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Redef  1  (v)   *♦   each  definition  to  v  in  L  is  matched  by 
any  of  the  forms  v  =  +  x3,  v  =  -x3+x4 , 
or  V  =  x3  +  x4 
where  x3  and  x4   can  be  any  variable  or  constant. 

Observe,  however,  that  for  v  to  be  a  FORTRAN  induction 
variable,  we  require  that  the  code  produced  by  expansion 
of  every  underlined  subpart  of  the  derivative  patterns  (in  D) 
associated  with  each  definition  to  v  in  L  must  be  an  element- 
ary  reducible  expression;  and  the  variables  on  which  this 
reducible  expression  depends  must  all  be  induction  vari- 
ables.  In  the  case  of  FORTRAN  only  one  set  of  induction 
variables  needs  to  be  defined  for  all  entries  in  F  and  for 
each  component  of  each  entry.   To  test  whether  a  variable  v 
is  in  this  set,  we  can  use  the  predicate  defined  as 
follows : 

Fortind{v)  **   Redef  1  (v)  & 

Vdefinitions  to  v  within  L, 

if   the  definition  is  matched  by  the  forms 

V  =  +  x3     then 
Fortind(x3)     else 

if   the  definition  is  matched  by  either 

V  =  -  x3  j^  x4  or  V  =  x3  +  x4  then 
Fortind(x3)  &  Fortind(x4")  else 
False . 
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In  order  to  make  Fortind  a  terminating  algorithm,  we 
must  substitute  the  value  True    for  any  recursive  call  to 
Fortind  which  passes  an  argument  which  has  been  previously 
passed.   Note  finally  that  Fortind  can  be  used  to  test 
variables  matching  both  parameters  of  xl  *  x3,  the 
parameter  xl  in  xl/x2 ,  and  x2  in  xl**x2.   Since  the  D  table 
does  not  specify  any  parameters  changes  for  x2  in  xl/x2 
and  xl  in  xl**x2,  only  region  constants  are  permitted  for 
these  parameters . 

To  show  termination  of  Algorithm  1-2  we  use  the  fact 
that  in  any  program  there  can  exist  only  a  finite  number 
of  induction  variables  and  region  constants.  Thus,  in 
reducing  any  binary  operation,  only  a  finite  number  of  other 
different  costly  binary  operations  can  be  generated  and 
subsequently  reduced. 

In  contrast  to  FORTRAN,  the  task  of  recognizing  the 
relative  cost  of  set  theoretic  operations,  necessary  for 
constructing  SETL  F  and  D  tables,  is  not  so  simple.   The 
main  difficulty  is  in  statically  estimating  the  relative 
sizes  of  sets   (which  helps  to  deduce  whether  an  expression 
iterates  over  a  large  or  small  set) .   One  possible  way  of 
making  this  estimation  automatic  might  involve  a  special 
type  of  global  analysis,  specifically  by  incorporating 
the  property  'this  data  object  is  a  small  set'  in  a  monotone 
framework  to  which  Kildall's  technique  [KI2]  applies.  The 
approach  taken  in  this  thesis,  however,  is  much  simpler 
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since  we  only  need  to  determine  the  relative  sizes  of  sets 

A  and  x  in  the  context  of  an  assignment  x  :=  x  +  A  executed 

repeatedly  in  a  program  loop  L.   As  shown  by  the  case  studies 

in  Chapter  4  and  Appendix  F,  in  such  contexts  it  is  highly 

likely  that  A  will  be  small   relative  to  x. 

F  and  D  tables  for  SETL  are  shown  in  Appendix  C,  ii. 

Every  derivative  shown  in  D   is  considered  profitable 

a   priori   except  for  subcase  5,  Rule  2   in  which  we  require 

that  the  set  s„    must  be  reduced  to  make  FD  worthwhile. 
0 

A  straightforward  procedure  for  finding  induction 
variables  for  SETL  is  also  available.   The  F  and  D  tables 
shown  in  Appendix  C,  ii  suggest  that  this  algorithm  should 
work  with  six  different  sets  of  induction  variables,  and 
that  no  recursion  is  required.   We  define  these  six  sets 
as  follows : 

1.  IV   =   {all  set  valued  variables  x  all  of  whose 

redefinitions  in  L  are  of  the  form  x  :=  x  +  A} 

2.  IV  =   {all  set  valued  variables  x  which  are  only 

modified  in  L  according  to  the  rule  x  :=  x-  A} 

3.  IV   =   {all  map  variables  f  affected  only  by  indexed 

assignments  f (y, , . . . ,y  )  :=  z  in  L  where 

y, , . . . ,y    are  region  constants} 
^1      ■'n        ^ 

4.  IV.  =   {all  integer  variables  x  which  change  only 

byx:=x+AinL} 

5.  IV   =   {all  tuple  variables  x   that  only  vary  in  L  in  the 

the  following  ways   x  :=  x  +  x3 ,  x(x3)  :=  x4  , 
and  x{x3:x4)  :=  x5} 
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6.    IV,  =  {all  tuple   variables  x  which  only  undergo 
modifications  of  the  form  x  =  x3  +  x  in  L}. 

Any  elementary  expression  of  the  forms  1,  2,  3,  4,  11, 
and  12  of  the  F  table  which  depend  on  variables  in  IV, u  iv„ 
or  on  region  constants  are  considered  to  be  reduction 
candidates.   Dif ferentiable  expressions  9  must  depend  on 
region  constants  and  arguments  xl  and  x2  which  are  found 
in  IV   and  the  set  RC  of  region  constants.  For  expressions 
of  the  forms  6,  8,  and  10,  we  require  that  each  non  loop 
invariant  argument  belong  to  IV^ .   Subclass  7   of  6 ,  however, 
may  also  involve  variables  in  IV„ .   Induction  variables  xl  and 
x2  of  expression  class  13   must  belong  to  IV^  u  RC  and 

D 

IV  u  RC   respectively.  Set  former  expressions  matching 
the  basic  reducible  form  number  5  will  be  considered 
differentiable  if  xl  g  (IV,  u  IV2  u  RC)  and  if  all  map 
variables   (which  match  to  special  patterns  denoted  by  f„ 
in  the  D  table)  occurring  in  the  boolean  subexpression  K 
belongs  to  IV^  u  RC .   For  expressions  {x  e  s|k(x)}   of 
form  number  5,  we  further  stipulate  that  all  occurrences 
of  map  induction  variables  within  K  must  head  map   retrieval 
terms  involving  the  bound  variable  x  of  the  set  former. 

To  show  termination  of  Algorithm  1-2  for  SETL,  we  can 
give  a  bound  on  the  number  of  differentiable  expressions 
introduced  by  FD  based  on  the  number  of  indexed  assignments 
to  map  induction  variables  and  to  the  f-depth  of  auxiliary 
sets  generated  as  derivative  code  from  basic  form  #5  of  Appendix 
C  ii  (cf.  Chapter  2  for  further  discussion  of  Rule  2). 
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3.    A  Semiautomatic  Approach 

In  SETL,  expressions  can  depend  on  large  sets  and  maps 
so  that  a  strategy  for  FD  which  seeks  to  differentiate  an 
expression  e  only  after  first  reducing  all  subexpressions 
of  e  may  be  prohibitively  expensive  in  space  usage.  There- 
fore, it  may  be  useful  to  consider  an  alternative  strategy 
which  trades  off  speed  for  space  by  differentiating  expres- 
sions with  unreduced  subexpressions. 

As  an  example  of  this,  consider 

(7)  c  =  {x  e  (s  +  t)  I  K(x')  } . 

If  we  differentiate  (7)  without  first  reducing  the  union 
operation  s  +  t,  we  save  the  space  which  would  be  required 
for  storing  c'  =  s  +  t  if  c'  were  also  reduced.  Note  also 
that  by  avoiding  reduction  of  c'  we  can  even  gain  speed, 
since  the  prederivatives  of  c  relative  to  the  changes 
s  :=  s  +  A  are 

(8)  c  =  c  +  {x  e  A  I  K(x)}   and 

c  =  c-{xeA|x^t&K(x)} 

respectively. 

However,  as  yet  we  lack  an  automatic  strategy  which 
deals  with  space/time  tradeoffs.   Hence  we  propose  an 
interactive  system   in  which  expressions  are  manually 
selected  for  reduction,  one  at  a  time.  Of  course,  before 
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an  expression  can  be  selected  it  must  be  marked  'reducible' 
by  a   routine  described  below,  which  we  will  call  algo- 
rithm 1. 

To  select  a  qualified  expression  for  reduction,  a  user 
then  issues  the  command, 

(9)  $FD,  LOOP#,  NAME  =  EXP 

from  his  terminal;  this  directs  the  FD  transformation 
generators  to  differentiate  the  expression  EXP,  and  to  keep 
its  value  available  within  a  uniquely  named  vai^iable  NAME 
throughout  a  loop  identified  by  a  statement  number,  LOOP# . 
Our  proposed  system  will  automatically  validate  this  command 
before  actually  performing  FD  by  use  of  a  procedure  we  will 
refer  to  as  Algorithm  2. 

Although  our  description  of  Algorithms  1  and  2  will 
be  language  independent,  these  algorithms  should  only  be 
implemented  for  languages  in  which  space  utilization  can 
be  of  overriding  concern.   Since  the  cost  of  extra  storage 
required  by  the  Fortran  level  FD  is  low.  Algorithm  1-2   is 
preferable  in  the  FORTRAN  context.   However,  we  expect 
that  the  semiautomatic  FD  approach   we  are  about  to  describe 
can  be  tailored  effectively  to  very  high  level  languages 
such  as  SETL,  SNOBOL,  and  APL. 
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3.1   Algorithm  1. 

Once  induction  sets  and  the  set  of  region  constants  RC 
are  computed,  we  can  find  all  nodes  n  in  the  parse  tree  of 
a  program  loop  L  which  correspond   to  dif ferentiable  expres- 
sions.  The  procedure  we  will  sketch  for  doing  this  is 
bottom  up  in  that  inner  expressions  are  handled  before 
outer  expressions;  i.e.,  the  algorithm  starts  with  the 
leaves  of  a  parse  tree  representation  of  an  expression 
proceeds  to  predecessors  and  decides  reducibility  along 
the  way.   To  decide  reducibility.  Algorithm  1  uses  the 
following  criteria.   An  expression  e  in  L  is  reducible  if 
e  is  an  elementary  reducible  expression  (cf.  the  definition 
of  CANDS   in  Section  2.1);  e  will  also  be  considered  reduc- 
ible if  it  is  matched  by  some  f  e  F  and  if  each  subexpression 
X  (of  e)  matched  by  a  pattern  variable  x  of  f  is  either  a 
region  constant  expression,  a  member  of  the  induction 
variable  set  IV(x,f),  or  an  induction    expression    for  f  and  x; 
i.e.,  X   is  an  induction  expression  for  f  and  x  if  x  is 
reducible,  and  once  reduced  with  its  value  kept  available 
in  a  variable  t,  t  would  belong  in  IV(x,f). 

The  overall  logic  of  Algorithm  1  is  as  follows: 
1.    For  each  leaf  £  in  L,  if  I   corresponds  to  a  region 
constant  of  L   mark  it  'good';  otherwise,  if  I    is  contained 
in  a  subtree  e  matched  by  an  elementary  form  f  in  F  in 
which  I    is  matched  by  some  pattern  variable  x  in  f  and 
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if  X  G  IV(x,f)  then  mark  £  'good'. 

2.  Repeat  step  3  until  no  more  nodes  in  L  can  be  marked. 

3.  For  each  unmarked  node  n  s  l  and  for  each  form  f  g  F, 
if  the  two  conditions 

(1)  Match (n,f,Pfunc)   and 

(2)  For  all  pattern  variables  x  in  f  the  node  Pfunc(x) 
is  marked  'good' 

both  hold,  we  will  mark  n  'reducible'  and  associate  n  with  f, 
We  will  also  mark  n  good  if  it  represents  a  region  constant 
or  an  induction  subexpression  x  of  an  outer  expression  e 
matched  by  a  basic  form  f'  g  F.   We  can  determine  whether 
X  is  an  induction  expression  for  x  and  f'  in  the  following 
way.   Let  W  be  the  set  of  all  derivative  patterns  in  D  for 
the  basic  form  f  such  that  costly  subparts  within  these 
patterns  are  considered  single  pattern  variables.  (Recall 
that  in  the  D  tables  of  Appendix  C  such  subparts  are 
underlined.)   If  the  pattern  variable  E  denoting  the  value 
of  f  could  be  defined  by  any  of  the  definitions  to  E  found 
in  W   and  still  qualify  as  a  member  of  IV(x,f')  then  x  is 
an  induction  expression  for  x  and  f'. 
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3.2   Specializations 

In  the  very  familiar  FORTRAN  case  all  derivative 
patterns  in  the  D  table  (cf.  Appendix  C(i))  except  for 
the  last  two  entries  of  D  exactly  match  parameter  change 
patterns  in  D.   Thus   every  reducible  expression  1  (products) 
and  2  (quotients)  of  F  and  each  reducible  exponentiation 
which  only  depends  on  the  parameter  changes  x„  =  +  x3 
(where  x3  must  also  be  restricted)  may  be  considered  as 
inductive  subexpressions  of  all  three   kinds  of  expres- 
sions which  occur  in  the  F  table. 

The  following  simplified  FORTRAN  marking  algorithm 
exploits  this  fact.   In  it  we  use  the  predicate, 

Fortindl  (v)  *>  all  definitions  to  v  in  L 
are  of  the  form  v  =  +  x3  and 
Fortindl (x3)  holds; 

This  works  in  conjunction  with  the  induction  variable  predi- 
cate Fortind  (discussed  in  subsection   2.3)  to  find  induc- 
tion expressions.   We  also  make  use  of  a  map  Mark,  partially 
defined  on  L ,  to  indicate  induction  and  reducible  expres- 
sions . 
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Algorithm  IFORT 

1.  Initialize  the  map  Mark  to  nullset. 

Find  the  set  RC  of  nodes  in  L  corresponding  to  region 
constant  expressions  of  L. 

2.  For  each  leaf  £  e  L  such  that    £  ^  RC , 

t/  FortindKText  (£)  )  then 

assign  'INDl'  to  Mark(£);  otherwise, 

t/  Fort ind  (Text (£) )  then 

assign  •IND2'  to  Mark(£); 

3.  Repeat  step  4  exhaustively. 

4.  Separate  all  nonterminal  nodes  n  in  L  for  which 
i.    Mark(n)  is  undefined; 

ii.   n  ^  RC; 

iii.  For  each  successor  node  y  G  Tsucc{n), 
either  y  g  RC  or  Mark(y)  is  defined; 

iv.    f  e  F  such  that  Match {n,f ,Pfunc)  holds; 

(note  that  F  is  defined  in  Appendix  C  (i) ) 
into  one  of  the  following  three  cases: 

a)  If  xl  *  x2  matches  n  and  xl  (respectively  x2)  is  a 
region  constant  expression,  assign  the  value  of 
Mark (Pfunc (x2) )  (respectively  Mark (Pfunc (xl ) ) )  to 
Mark(n);  else,  if  Mark (Pfunc (xl) )  (Mark (Pfunc (x2) ) ) 
is  equal  to  'INDl',  assign  the  value  Mark (Pfunc (x2 ) ) 
(Mark (Pfunc (xl) )  to  Mark(n);  otherwise  set  Mark(n) 
to  'IND2'. 
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b)  If  xl/x2  matches  n  and  x2  is  a  region  constant 
expression,  perform  the  assignment 

Mark  (n)  :=  Mark (Pfunc (xl ) )  . 

c)  If  xl  **  x2  matches  n  and  xl  is  a  region  constant 
expression,  perform   Mark{n)  :=  Mark (Pfunc (x2 )) . 

After  IFORT  terminates,  all  expressions  Text(n)  such 
that  Mark (n)  is  defined  and  n  is  a  nonterminal  node  of  L 
will  be  reducible. 

A  variant  of  algorithm  1  adapted  to  SETL  has  more 

cases  to  consider  but  is  no  more  complicated  than  IFORT. 

The  F  and  D  tables  for  SETL  are  defined  in  Appendix  C(ii). 

First,  we  need  to  calculate  the  set  of  region  constants  RC 

and  the  induction  variable  sets  IV, ,...,IV^  (cf.  subsec- 

1        6 

tion  2.3).   This  allows  us  to  detect  the  elementary  reducible 
expressions.   In  order  to  pick  out  all  the  reducible  expres- 
sions, we  must  be  able  to  determine  induction  expressions. 
Fortunately,  the  primary  induction  expressions  of  interest 
are  set  union,  intersection,  set  difference,  and  set  former 
(these  are  forms  1,2,3,5,8  and  9  in  our  F  table)   whose 
derivative  patterns  in  the  D  table  only  realize  changes 
according  to  the  forms  x  :=  x  +  A.   Consequently,  we  can 
treat  reducible  expressions  of  the  kinds  just  mentioned  as 
induction  expressions  that  behave  in  much  the  same  way  as 
induction  variables  belonging  to  IV,  and  IV2 .   This  provides 
a  way  to  follow  the  logic  of  Algorithm  1  (but  with  a  few 
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minor  adjustments  to  be  discussed  in  Chapter  4)  and  locate 
the  remaining  nonelementary  reducible  expressions. 

To  illustrate  the  preceding   remarks,  consider  the 
expression 

(10)       c  =  {x  e  (T  -  S)  I  f(x)  =  q}  -  Q 

occurring  in  a  loop  L  to  be  optimized.   Suppose  that, 
within  L,  the  set  s  is  modified  exclusively  by  changes  of 
the  form  S  :=  S  -  A,   Q  varies  only  by  set  additions, 
Q  :=  Q  +  A,  and  T,  f  and  q  are  all  region  constants.  Then 
a  SETL  version  of  Algorithm  1  will  decide  that   S  and  Q  are 
induction  variables  belonging  to  IV„  and  IV,  respectively. 
The  first  reducible  subexpression  of  c  to  be  detected  will 
be  c,  =  T  -  S.   Since  s  only  undergoes  set  deletions,  we 
know  that  the  value  of   c,  can  only  grow  by  set  additions. 
Thus,  we  will  consider  c,   to  be  an  induction  expression 
belonging  to  IV, .   At  this  point  the  expression 
C2  =  {x  e  c, |f (x)  =  q}   can  be  marked  reducible.  Moreover, 
since  the  subexpression  c,   belongs  to  IV,  ,  c„  is  also  an 
induction  expression  belonging  ito  IV,.   Finally,  since 
both  c      and  Q  belong  to  IV,  ,  the  expression  c  =  c^  -  Q  is 
reducible  and  is  an  induction  expression  in  both  categories 
IV   and  IV  . 
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3 . 3   Algorithm  2 

After  Algorithm  1  has  determined  the  dif f erentiable 
subexpressions   in  a  program  loop  L,  a  user  will  be  able 
to  select  these  subexpressions   one  at  a  time  for  reduction 
Selection  is  made  by  commands  of  the  form  (9) ,  which  will 
cause  the  reduction  routine  Algorithm  2  to  execute.   The 
input  and  output  specifications  of  Algorithm  2  are  the 
same  as  those  for  Algorithm   1-2,  and  these  two  algorithms 
have  rather  similar  logic.   However,  while    Algorithm   1-2 
ensures  profit  by  reducing  all  reducible  expressions  in  L, 
Algorithm  2  only  attempts   to  reduce  as  few  reducible 
subexpressions  of  EXP  as  possible  (so  as  to  conserve  space) 
without  sacrificing   expected  speedup. 

Algorithm  2. 

1.  To  validate  the  command  (9),  we  check  that  LOOP# 
refers  to  a  program  loop  L,  that  NAME  is  not  a  program 
variable   which  already  exists,  and  that  the  expression 
EXP  is  located  in  L  at  a  node  n  which  has  been  marked 

'reducible'  by  Algorithm  1. 

2.  Next  we  order  the  reducible   subexpressions  of  EXP 
in  a  postorder  arrangement  (cf.  Appendix  E  (iv))  as 
indicated  by  the  following  SETL  assignment, 

Cands  :=  [x  g  Postorder (n) [Marked (x) =' reducible ' ] 
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3.  For  each  node  t  selected  from  the  tuple  Cands, 
determine  the  particular  elementary  form  f  in  F  for  which 
Match (t, f,pfunc)  holds,  and  perform  steps  5-8. 

4.  Halt. 

5.  Generate  a  unique  name  v_  for  the  variable  which  will 

f 
hold  the  value  of  the  subexpression  f  in  L,  and  insert  an 

assignment   v_  :=  f  at  the  end  of  the  prologue  for  L. 
f 

6.  For  each  pattern  variable  x  in  f   for  which  x  is  an 

induction  variable  belonging  to  iv(x,f),   and  for  each 

program  point  p  G  Defs(x)  at  which  x  undergoes  a  change 

X  =  A_  ,  insert  derivative  code  which  keeps  v_  available 

X  f 

in  L.   Since  the  ordering  of  nodes  in  Cands  and  the  overall 

strategy  of  our  algorithm  makes  f  elementary,  we  can 

compute  derivative  code  for  f  in  the  same  way  as  we  did 

in  connection  with  Algorithm  1-2. 

7.  Within  L  replace  all  occurrences  of  f  by  v_ .   Also, 

f 
within  the  derivative  code  generated  in  step  6,  substitute 

an  appropriate  variable  v   for  any  expression  e  which  has 
already  been  reduced.   Next  make  appropriate  additions 
to  the  set  RC   of  region  constants  and  to  the  induction 
sets  IV.   Moreover,  within  the  derivative  code  generated 
in  step  6,  mark  each  node  n  'reducible'  if  the  subexpres- 
sion text(n)  is  formed  from  a  derivative  code  subpattern 
specially  underlined  in  our  D  table.   Recall  that  such 
underlining  indicates  that  further  reduction  is  necessary 
for  FD  to  be  profitable.   After  doing  this,  reduce  all 
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such  subexpressions  using  recursive  application  of 
Algorithm  2. 

8.    For  each  pattern  variable  x  in  f  in  which  x  is  a 
generated  variable  holding  the  value  of  a  reducible   sub- 
expression e  of  f  available  in  L,  if  x  is  not  used  within 
derivative  code  generated  for  f  then 
i.         Remove  all  derivative  code  (previously  generated) 

which  keeps  the  value  of  x  current, 
ii.        Remove  the  initialization  of  x  from  the  prologue 

for  L. 
iii.       Make  appropriate  corrections  to  RC  and  N. 
iv.        Replace  all  occurrences  of  x  in  L  by  occurrences 

of  e . 
On  the  other  hand,  if  the  generated  variable  x  is  used 
within  derivative  code  for  f,  prompt  the  user  so  that  he 
can  supply  a  unique  variable  name  to  be  used  in  place  of  x. 

To  illustrate  the  application  of  semiautomatic  FD  to 
SETL,  we  again  consider  the   expression  (10)  which  we  assume 
is  executed  repeatedly  within  a  program  loop  L.   A  user  of 
our  proposed  interactive  system  could  select  (10)  or  any  of 
its  reducible  subexpressions  (marked  by  Algorithm  1)  for 
reduction.   Suppose  that  in  order  to  conserve  space  he 
chooses  to  reduce  the  full  expression  (10)  by  issuing  the 
command, 

(11)        $FD,  15,  Diff  =  {x  e  (T  -  S)|f(x)  =  q}  -  Q 
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After  validating  (11)  ,  Algorithm  2  v/ill  arrange 
the  reducible  subexpressions  of  (10)  in  postorder  as 
follows:    (1)  c^  =  T  -  S,  (2)  C2  =  {xGc^ | f (x) =q} , 
and  (3)  c  =  c„  -  Q.   Then  we  first  reduce  c,  which  is 
matched  by  form   3  of  the  SETL  F  table  (cf.  Appendix  C(ii)) 
To  do  this,  we  begin  by  inserting  the  assignment 

(12)  c^  :=  T  -  S; 

at  the  end  of  the  prologue  for  L.   Since  T  is  a  region 
constant  and  S  is  an  induction  variable  which  undergoes  a 
single  modification  S  :=  s  -  A  at  a  program  point  p  in  L, 
the  only  update  code  for  c,  will  be  the  prederivative 

(13)  c   :=  C-,  +  A  *  T 

(generated  by  the  macro  found  in  the  D  table  entry  D(x2,3) 
of  Appendix  C(ii))   which  will  be  inserted  just  before  p. 
Next, all  occurrences  of  T  -  S  within  L  are  replaced  by 
occurrences  of  c, .   Then  the  identifier  c,  is  added  to  the 
induction  variable  set  IV, . 

At  this  point  Algorithm  2  will  select  c„  (which  is 
matched  by  form  5  in  the  F  table)  for  reduction.   The 
assignment 

(14)  c^     :=    {x  e  c^\  f (x)  =  q} 

is  inserted  at  the  end  of  the  prologue,  just  after  (12)  , 
and  the  prederivative  code 

(15)  c^     :=  C2  +  (x  S  (A  *  T)   I "f(x)  =  q} 
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(corresponding  to  the  change  (13))   is  placed  just  prior 
to  (13) .   Then  we  replace  occurrences  of  {xGc,  |  f (x)  =  q} 
by  occurrences  of  c^  ,  and   include  c^  within   the  induction 
variable  set  IV,.   However,  since  c,  is  not  used  in  the 
derivative  code  (15)  for  c^  ,  all  uses  of  c,  within  L  are 
eliminated.   That  is.  Algorithm  2  will  delete  the  derivative 
code  (13)  within  L,  remove  the  initialization  (12)  from  the 
prologue,  and  replace  all  occurrences  of  c-,  within  L  and  its 
prologue  by  occurrences  of  T  -  S.   The  identifier  c, 
will  also  be  removed  from  IV, . 

Now  the  expression  c  originally  selected  for  reduction 
by  the  directive  (11)  can  be  processed.   Algorithm  2  will 
first  place  the  initialization, 

(16)  c  :=  C2  -  Q; 

just  after  (14)  in  the  prologue.   It  will  then  examine  the 
D  table  entries  associated  with  elementary  form  3  of  the  F 
table  and  determine   the   derivative  code  for  c  relative  to 
the  change   (15)  in  c„  and  the  change  Q  :=  Q  +  A  in  Q. 
This  will  lead  to  insertion  of  the  prederivative  code 

(17)  c:=c+({xe(A*T)|f(x)=q}-Q); 
just  prior  to  (15) .   The  update  correction 

(18)  c  :=  c  -  (A  *  c^) ; 

will  be  placed  immediately  before  the  modification  to  Q. 
After  substitution  of  occurrences  of  c  for  occurrences 
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of  c-  -  Q  the  identifier  c  will  be  placed  in  both  IV^ 

and  IV  .   Note  that  the  fact  that  c„  is  used  within  (18) 

prevents   Algorithm  2  from  avoiding  the  reduction 

of  c„  (as  was  done  previously  in  the  case  of  c,  )  . 

Therefore,  the  system  will  request  a  user  supplied 

name  to  replace  c„.   If  we  suppose  that  this  name  is 

Temp,  then  all  occurrences  of  c„  within  L  and  its  prologue 

will  be  replaced  by  Temp.   The  reduction  procedure  completes 

its  work  by  replacing  all  occurrences  of  c  by  the  user 

supplied  name  Diff  given  in  (11) . 

In  consequence   of  these  actions  the  end  of  the  prologue 
to  L  will  contain  the  following  code. 

(19)  Temp  :=  {x  G  (T-S)  |  f(x)  =  q}; 
Diff  :=  Temp  -  Q; 

Within  L,  the  derivative  code  inserted  just  before   the 
change  to  S  will  be 

(20)  Diff  :=  Diff  +  ({x  e  (  A   *  T)  | f (x)  =  q}  -  Q)  ; 
Temp  :=  Temp  +{xe  (A*T)  [  f(x)  =q}; 

while  the  update  code  inserted  before  the  change  to  Q 
will  be 

(21)  Diff  :=  Diff  -  (A  *  Temp) ; 

The  reader  should  note  that  instead  of  using  (18), 
we  might  have  used  either  of  the  following  alternative 
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prederivative  codes:   c  :=c-  {xSAj  xGT&x^  S&f (x)=q} 
or  more  simply,  c  : =  c  -  A ;  either  of  these  alternatives 
would  eliminate  the  troublesome  use  of  c„  in  (18)  and  would 
make  it  possible  to  avoid  reduction  of  c,  (thereby  conserv- 
ing space) .   A  capability  to  choose  between  competing 
derivative  code  alternatives   might  be  a  useful  future 
extension   to  Algorithm  2.   ' 

In  the  following  chapter,  we  will  extend  the  techniques 
just  described  to  obtain  an  implementation  of  FD  for 
general  expressions.  Vie   will  also  study  SETL  FD  more 
closely  in  order  to  prepare  for  several  case  studies  of 
algorithms  derived  by  FD  and  other  transformations. 
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IV.   IMPLEMENTATION  DESIGN  FOR  FORMAL  DIFFERENTIATION 
OF  EXPRESSIONS  CONTINUOUS  IN  SOME  OF  THEIR  PARAI4ETERS 

A.    Introduction 

In  considering  formal  differentiation  of  expressions 
involving  discontinuity  variables,   we  face  complexities 
which  cannot  be  handled  without  extending  the  simple  FD 
framework  of  Chapter  III  (c) •   The  additional  information 
needed  to  specify  patterns  for  elementary  discontinuous 
expressions,  for  modifications  to  variables  of  such 
expressions,  and  for  associated  update  rules  requires  major 
adjustments  to  the  simple  structure  of  the  F  and  D  tables 
given  in  III  (C) .   Moreover,  a  full  FD  system  must  cope 
with  a  greater  number  of  relevant  basic  expressions,  a 
more  complicated  assortment  of  induction  variables,  and  a 
host  of  competing  alternative  transformations   whose  poten- 
tial for  program  improvement  is  often  unpredictable.  All 
these  factors  make  it  awkward  if  not  impossible  to  use  FD 
algorithms  utilizing  easy  variants  of  the  F  and  D  tables 
of  the  preceding  chapter.   However,  in  this  final  chapter 
we  will  modify  those  tables  to  support  an  FD  implementation 
design  fashioned  around  a  relatively  straightforward 
extension  to  the  methods  of  III  (C) . 

Although  a  fully  automatic  FD   implementation  is 
conceivable  (cf.  [Wl]  for  a  discussion  of  FD  at  the  PASCAL 
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level),  we  will  not  consider  this  but  instead  will  augment 
the  semiautomatic  FD  techniques  described  in  the  last 
chapter.   Our  extended  SETL  FD  system  will  incorporate 
many  of  the  transformations  described  in  Chapter  II  (D) 
within  an  extension  of  the  interactive  program  impvovement 
facility  described  in  Chapter  III  (A,B) .   The  use  of  the 
proposed  system  will  be  illustrated  by    considering  and 
improving  several  sample  SUBSETL  programs. 


B.    Semiautomatic  Formal  Differentiation  of 
Discontinuous  Expressions. 

In  this  chapter  we  consider  general  expressions 

\  J- )  rvx,/...,  X,  ,  ^-t   ,  1  '  '  '  '  '  ^j-,' 

continuous  in  some  of  their  variables   x-,,...,x,  ,  and 

1      k 

discontinuous  in  the  remaining  variables.   To  differentiate 
expressions  (1)  in  a  program  loop  L,  the  general  reduction 
method  of  Chapter  I  and  the  particular  transformations  of 
Chapter  II,  D   must  accomplish  two  related  tasks: 

1.  Store  separate  values  of  (1)  in  a  map  c(x,  ,,..., x  ) 
defined  on  entrance  to  L . 

2.  Keep  c  available  throughout  L  by  updating  c  whenever 
variables  on  which  c  depends   vary  in  L. 

If  each  stored  value   c(x,  ,,..., x  )  depends  continu- 

K"r  -L       n 

ously  on  changes  to  x^ , .  .  . ,x,   ,  and  if  each  derivative  of  c 
involves  an  iteration  over  only  a  small  portion  of  the  domain 
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of  c,   we  say  that  c  is  continuous  relative  to  the  conti- 
nuity variables  of  (1)  and  that   x,  ^,,...,x   are  removable 

■'  K+l       n 

discontinuities.   This  will  usually  imply  that  the  execu- 
tion cost  of  the  derivative  code  introduced  by  the  second 
step  noted  above  will  be  low  relative  to  a  single  calcula- 
tion of  (1)  . 

In  the  system  to  be  described  we  will  accordingly 
restrict  reduction  candidates  to  those  expressions  (1) 
all  of  whose  discontinuity  variables  are  removable.  We 
will  also  limit  our  reduction  methods  to  those  explored 
in  Chapter  II  (D)  (with  the  exception  of  memo  function 
techniques) ;  these  offer  a  high  likelihood  of  attaining 
program  improvement  under  circumstances  recognizable  by 
easy  analysis.  As  in  the  set  former  case  studied  in 
Chapter  II,  we  aim  at  order  of    magnitude  speedups . 

An  implementation  design  for  semiautomatic  FD  for 
discoiitinuous  expressions  can  be  based  on  the  semiauto- 
matic approach  described  in  Chapter  III.   To  find  reducible 
expressions  (with  discontinuities  allowed)  in  a  loop  L, 
we  will  make  use  of  a  marking  algorithm,   Algorithm  1', 
closely  related  to  Algorithm  1  of  Chapter  III.  Algorithm  1' 
proceeds  by  first  determining  the  elementary  reduction 
candidates   Cands.   (these   are  reducible  expressions  hav- 
ing no  reducible  subexpressions)   before  gathering  up 
the  remaining  nonelementary  ones.   We  will  also  discuss  a 
reduction  algorithm  which  we  call  Algorithm  2'  (based  largely 
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on  Algorithm  2  of  Chapter  III)   for  differentiating  expres- 
sions marked  by  Algorithm  1'. 

Both  these  algorithms  make  use  of  a  table  F  of  elementary 
forms  and  a  derivative  code  table  D   in  a  way  similar  to 
the  related  algorithms  of  Chapter  III.   However,  to  deal  with 
the  new  problem  areas  raised  by  discontinuities,  we  will 
make  use  of  more  complicated  pattern   constructs,  and  more 
sophisticated  matching  and  macro  expansion  operations.  As 
we  shall  see,  the  major  differences  between  the  FD  design 
of  Chapter  3  (C)  and  our  new  design  for  discontinuous 
expressions  will  be  localized  in  pattern  handling.   These 
differences  involve   format  changes  in  our  F  and  D  tables 
and  new  versions  of  Match  and  Expand. 

As  in  the  case  of  completely  continuous  expressions, 
reduction  candidate  expressions  involving  discontinuities 
can  be  recognized  as  those  which  are  formed  by  composition 
and  parameter  substitution  from  a  finite  collection  F  of 
elementary  reducible  forms.   This  recognition  problem  is 
handled  primarily  by  our  new  Match  routine  (presented  in 
Appendix  E  (V))   under  the  control  of  Algorithm  1'.  Thus, 
Match  will  not  only  perform  the  simple  syntactic  kind  of 
matching  between   a  pattern  f  and   an  expression  e  (as 
was  done  previously) ,  but  it  will  also  check  a  variety  of 
restrictions  imposed  on  each  matched  subexpression  x  of  e, 
where  x  is  a  pattern  variable  of  f.   Consequently,  when 
Match (e, f ,Pfunc)  holds,  we  know  that  e  is  reducible. 


425 


Although  the  aforementioned  restrictions  on  subexpres- 
sions X  of  e   are  fairly  uniform  and  easily  expressed  for 
completely  continuous  expressions,  this  is  not  the  case 
for  discontinuous  expressions.   Indeed,  for  discontinuous 
expressions,  it  will  be  convenient  to  test  these  restrictions 
by  executing  boolean  valued  function  procedures  Restrict (x, f) 
defined  for  each  form  f  e  F   and  for  each  pattern  variable 
X  in  f .    During  a  matching  operation  using  a  pattern  f, 
and  just  after  the  pattern  variable  x  of  f  is  matched  to  a 
subexpressions   x.  Restrict (x, f)   must  be  executed  and 
return  true    in  order  for  matching  of  x  to  succeed. 

We  can  categorize  pattern  variables  x  into  three  basic  types  — 
discontinuity,  continuity,  and  special  parameters.  If  x  is 
a  discontinuity  parameter  than  Restrict (x, f)  will  return 
true      only  if  x  consists   entirely  of  free  variables  of  f 
and  also  if  x  is  not  a  region  constant. 
In  the  case  where  x  is  a  continuity  parameter,  we  can 
usually  restrict  x  by  the  following  predicate  which 
Restrict (x , f)  computes:   x  is  a  region  constant  expression 
or    X  is  an  induction  variable   belonging  to  IV(x,f)  or 
X  is  an  induction  expression  for  f  and  x.   Note  here 
that  the  sets  IV (x, f) ,  essentially  serving  the  same  purpose 
as  in  the  last  chapter   (cf.,  p.  392),  can  be  computed  for 
every  f  in  F  and  each  continuity  parameter  x  in  f .   General 
routines  to  compute  the  set  RC  of  region  constants  and  the 
IV  sets   are  given  in  Appendix  E  (iv) . 
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When  X  is  a  special  parameter.  Restrict (x, f)  must  be 
programmed  in  a  highly  particular   and  unsystematic  way. 
As  an  example,  consider  the  special  parameter  K  occurring 
as  part  of  the  following  setformer   pattern, 

f  =  {y  e  x1|k}  . 

For  this  case,  we  might  want   Restrict (K , f)  to  implement 
the  following  predicate:   y  is  a  variable  occurring  within 
K  &  for  each   variable   g  ■   occurring  in  K  one  of  the 
following  conditions  must  hold: 

1.  g  is  a  region  constant. 

2.  g  =  y. 

3.  g  is  a  map  variable  which  only  occurs  in  K   as  a  map 
retrieval  involving  y  and  is  also  an  induction  variable 
which  can  only  vary  by  indexed  assignments. 

In  actuality  our  patterns  will  only  make  use  of  a  very 
few  special  procedures  of  the  kinds  just  mentioned. 

Since  matching  and  expansion  operations  using  a  pattern 
P  will  always  visit  the  nodes   of  P  in  postorder,   we  can 
specify  when  a  particular  procedure  pname    should  be  executed 
during  either  of  these  operations  by  inserting  the  term 
Ipname      at  an  appropriate  place  in  P.   During  matching, 
procedures  will  usually  be   ised  for  validating  expressions 
matched  to  pattern  variables.   Thus,  we  will  frequently 
insert  a  procedure  name  proaname      within  a  pattern  immedi- 
ately after  the   occurrence  of  a  pattern  variable  patname , 
and  will  allow  proaname    to  refer  to  patname . 
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By  allowing  patterns  to  contain  procedure  names,  we 
gain  considerable  power.   However,  in  order  to  provide  a 
practical  pattern  handling  capability,  it  is  necessary  to 
include  a  few  additional  features.   The  routines  Match  and 
Expand   shown  in  Appendix   E  (V)  implement  these  features . 
In  particular,  these  routines  can  handle  the  intricate 
patterns  specified   in  Appendix  C  (iii)  for  the  FD  tables 
used  with  our  SETL  implementation  design  discussed  in  the 
next  section. 

For  practical  reasons,  it  is  of  considerable  importance 
to  allow  a  single  pattern  to  match  different  variants  of 
the  same  expression.   To  achieve  this,  we  allow  patterns 
to  be  built  using  alternation  of  subpatterns .  We  borrow 
SNOBOL  notation  for  this.   Using  alternation,  we  can  specify 
the  entire  elementary  form  table    F  used  for  FD  as  a 
single  pattern, 

(2)  F  =  Form, I  Formal  ...  I  Form 

1 '      2 '      '      n 

If  Pfunc  is  the    pattern  variable  map  constructed  during 
matching,  then  failure  to  match  an  alternand  of  F  causes 
Pfunc  to  be  restored  to  its  previous  value  just  before  F 
is  matched.   Note  that  alternation  provides  a  mechanism  for 
choosing  between  competing  transformations. 

As  a  notational  convenience,  we  allow  pattern  names 
to  be  associated  with  pattern  expressions.  This  is  achieved 
by  assignments  of  the  form 

(3)  <pattern  name>  =  <pattern  expression> 
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This  feature  allows  us  to  use  pattern  names  (in  place  of 
the  pattern  expressions  they  represent)   as  part  of  pattern 
expressions.   It  may  be  convenient  to  use  several  assignments 
of  the  form  (3)  for  synthesizing  a  single  pattern  expres- 
sion . 

We  will  sometimes  need  to  use  a  pattern  which  matches 
all  of  the  components  of  a  parameter  list  of  arbitrary  size. 
Such  a  pattern  can  be  specified  using  the  following  recurs- 
ively  defined  pattern  name, 

(4)  Params  =  q3.  ','  Params  |  q3. 

where  the  comma  in  quotes  is  a  literal  and   q3.   is  a  special 
pattern  variable.   The  pattern  assignment  (4)  borrows  the 
notion   of  recursive  pattern  definition  from  Snobol.  (Note 
that  the  right-hand  side  occurrence  of  Params  in  (4)  is 
treated  in  the  same  way  as  a  Snobol  unevaluated  expression) . 
The  period  appearing  immediately  after  the  pattern 
variable   q3   denotes  that  each  time  q3.   is  encountered 
during  matching,  a  unique  pattern   variable   (which 
we  call  an  instance  of  q3.)   is  generated.   Each  such  instance 
will  have  the  form   q3i   where   i-1   is  the  number  of 
previous  instances  generated.   When  matching   succeeds  for 


q3i,  q3i  will  refer  to  the  text  which  is  matched.   If  match- 
ing fails   a  previous  system  state   s   will  be  restored  (i.e., 
the  pattern  variable  map  will   be  restored)   and  instances 
of  q3.   generated  after  the  state  s  was  last  saved  are  lost. 
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Furthermore  the  underlying  counter  #q3  which  we  use  to 
maintain  the  number  of  q3 .  instances  is  also  restored. 
For  example,  if  the  left  alternand  of  (4)  fails  in  matching 
the  comma  just  after  q3i   has  been  generated,  the  associa- 


tion between   qSi  and  q3i   will  be  destroyed,  i  will  be 
reset  to  i-1,  and  matching  will  proceed  with  the  right 
alternand  of  (4) . 

Some  of  the  remaining  features  of  our  pattern  facility 
are  illustrated  in  the  following  pattern  definition  used 
for  matching  setformers, 

(5)   Form  =    [  '{  '  [x  'G  '  xl]  '  I  '  [K]  '  }  '] 


Note  that  the  pattern  expression  (5)  uses  quoted  symbols 
to  denote   literals,  pattern  variables  x  and  xl,  the  pattern 
name  K,  and  predecessor  formation  brackets  used  to  express 
tree  strcture   for  (5) ;  this  corresponds  to  the  following 
tree  representation: 


K 


In  (5)  we  intend  K  to  be  a  pattern   name  which  matches 
a  conjunction  of  terms  in  a  rather  general  way.  If  we 
use  a  pattern  name  conj  for  matching  conjuncts  then  K  can 
be  defined  recursively  as  follows: 
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(6)  K  =  [Conj]  •&'  K  I  [Conj] 
where   Conj   is  defined  by  the  rule, 

(7)  Conj  =  [F8.  •('  [x]  ')''='  '0']        | 

K5.  I 

!  x'e'  x2  I 

!  K4  'e  •  [F2  '  ( •  [Params]  ' )  ' ]  | 

q4.  -e'  [F3.  '  (  '  [x]  •)  '] 

The  special  symbol   !   appearing  in  (7)  triggers 
execution   of  a  'built-in'  procedure  whenever    this  symbol 
is  encountered  during  matching.   The  effect  of  this  proce- 
dure is  to  set  up  a  gate  for  each  occurrence  of  the  symbol  ! . 
Initially,  all  such  gates  are  'open'.   During  matching, 
whenever  a  closed  gate  is  encountered  failure  occurs; 
whenever  an  opened  gate  is  reached,  matching  proceeds 
through  the  gate  but  leaves  the  gate  closed.  Since  we  store 
the  state  of  each  gate  within  the  pattern  variable  map  Pfunc, 
when  failure  occurs  and  a  previous  state  of  Pfunc  is  restored, 
closed  gates  may  become  reopened.   In  connection  with  the 
pattern  (5) ,  the  preceding   rules  imply  that  any  set  former 
matched  by  the  pattern  (5)  can  have  within  its  boolean  sub- 
part at  most  one  conjunct  matched  by  x  's  '  x2  and  one  conjunct 
matched  by  K4  'e'  [F2  '('  [Params]  ')']  . 

To  adapt  (5)  for  matching  dif f erentiable  expressions, 
we  must  modify  (5)  by  inserting  procedure  names.  Suppose 
that  Dvar ,  Cvar ,  and  Svar   are  boolean  valued  procedures 
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which  validate   discontinuity,  continuity,  and  special 
parameters   respectively.   (Recall  that  we  sketched  the 
logic   for  these  procedures  earlier  in  this  section.) 
Then  within  the  pattern  expressions  (4),  (5)  and  (7) 
we  should  insert  ! Dvar   after  each  pattern  variable  whose 
name  begins  with  the  letter  q;  insert  ! Svar  after  each 
pattern  variable  beginning  with  the  letter  K;  insert 
!Cvar  after  all  remaining  pattern  variables  with  the 
exception  of  x. 

After  (5)  is  altered  in  this  way,  it  will  be  able  to 
match  parse  trees  for  an  assortment  of  reducible  SETL  set 
formers.  For  example,  (5)  matches  the  parsed  form  of  the 
following  SETL  setformer. 

(8)      {x  G  s|t  e  g(x)  &  f(g(x  +  z)  )  =  0 

&  x**2  G  h(t+b,a)  &  d*t  G  f(x) 
&  X  G  Q  &  f(x)  =  0} 

The  pattern  variable  map  Pfunc  which  results  from  matching 
(5)  to  (8),  will  associate  pattern  variables  of  (5)  with 
the  text  (8)  in  the  following  way: 


.9)       X    =    X,       xl  =  s,   q41  =  t,   q42  =  d*t,   F31  =  g. 


F32  =  f,   K51  =  f(g(x  +  2))  =0,   x2  =   Q,  K4  =  x**2. 


F2  =  h,   q31  =  t  +  b,    q32  -  a,    F81  =  f  . 

Once  defined,  Pfunc  can  be  used  to  trigger  macro 
expansion  operations  which  produce  code  to  reduce  (8) , 
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We  treat  macro  expansion  as  the  inverse  of   pattern  matching, 
so  that  all  of  the  operations  just  described  for  pattern 
matching  can  be  used  for  macro  expansion.  After  matching 
a  pattern  P  to  a  parse  tree  T,  we  obtain  a  map  Pfunc  which 
associates  pattern    variables  x  of  P  to  a  matched  sub- 
tree Pfunc (x)  of  T.   Macro  expansion  will  produce  T  start- 
ing from  P  and  Pfunc.   (Of  course,  it  is  not  at  all  practical 
to  force  the  converse  to  hold.) 

A  few  special  features  of  macro  expansion  must  be 
noted.   Suppose  we  use  the  map  Pfunc  which  results  from 
matching  (5)  to  (8)  and  expand  the  pattern  Exp   defined 
below, 

(10)  Exp  =  [E  *  '(•  [Params]  ')']  |  E* 

Params  =  Param','  Params  |  Param 
Param  =  q3 .  |  q4 . 

The  symbol  *  occurring  within  (10)  has  special  significance 
only  in  connection  with  patterns  used  for  macro  expansion. 
In  the  case  of  (10),  when  E*  is  encountered  during  expan- 
sion, a  new  program  variable  name  newname    and  a  new  ^lank 
atom  n  are  generated,  where  Leaf(n)  =  True    and  Label (n) 
=  newname  .   We  then  associate  the  pattern  variable  E  with  n 
by  making  the  SETL  assignment  Pfunc (E)  :=  n. 

As  noted  in  Chapter  3,  expansion  proceeds  by  visiting 
the  nodes  of  a  pattern  tree  in   postorder.   Recall  that 
expansion  of  a  pattern  tree  P   and  a  map  Pfunc  produces  a 
new  parse  tree  T  which  is  formed  from  P  essentially  by 
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replacing  each  pattern  variable  x  in  P  by  a  subtree  Pfunc(x) , 

We  also  permit  expansion   of  patterns  such  as  (10) 
formed  using  alternation.   To  see  how  alternation  works, 
we  first  note  that  a  subpattern  P '  of  a  pattern  P  can  only 
fail  during  expansion  when  P'  contains  a  pattern  variable 
not  in  the  domain  of  the  pattern  variable  map  Pfunc.  Fail- 
ure causes  Pfunc  as  well  as  the  expanded   parse  tree  T  to 
be  restored  to  their  values  just  before  expansion   of  P ' . 
If  P'  is  an  alternand  of  a  subpattern  P'  |  Q,  failure  of  P' 
will  cause  expansion    to  proceed  with  Q. 

Dotted  elementary  variables  such  as  q3.  occurring  in 

(10)  behave  the  same  way  in  expansion  as  in  matching.  When 
q3.  is  first  encountered,  expansion  will  treat  q3.  in  the 
same  way  as  qSl,  the  first  instance  generated  by  q3.  .  After 
the  i-l'st   instance  q3i-l   of  q3.  is  expanded  successfully 
and  q3 .  is  again  encountered,  we  attempt  to  use  the  next 
instance    q3i. 

As  an  example,  note  that  expansion  of  (8)  using 
the  Pfunc  map  defined  by  (9)  yields  the  following 
expression, 

(11)  c(t+b,a,t,d*t) 

where  c  is  a  new  variable  name  generated  during  expansion. 
As  part  of  our  FD  procedure  we   will  use  the  map  retrieval 
operation  (11)  to  replace  the  occurrence  of  (8)  being 
reduced . 
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We  can  now  give  an  example  illustrating  the  structure 
of  our  derivative  table  D,   Suppose  that  within  this  table, 
the  entry  D(xl,Form)   for  the  elementary  form  (5)  and  pattern 
variable  xl  is  a  set  which  contains  the  triple  [mod,preD,  ] . 
Let  the  modification  pattern  mod  be  defined  by  the  rule 

(12)  mod  =  [xl  •:='  [xl  '+'  A]  ' ; '  ] 

and  the  prederivative  pattern  by  the  following  pattern 
definitions , 

(13)  preD  -  ['('  'V  [iterator]  ')'  [Add]  'end'  'V  '  ;  '  ] 
iterator  =  iterpart  |  [K] 

iterpart  =  [iter]  ','  iterpart  |  [iter] 
iter  =  !x  'S'  '  ( '  [A  -  xl]  ')  '  | 

!w2*  'G-  [  •{•  [u*  'G'  ['Project'  '(' 


[#q3  ',  •  F2]  •)  '  ]  ]  '  I  ' 
[K4  'G'  [F2  '  (  '  [u]  •)  '  ]  ]  •  }  '] 
w3.*  'G'  [F3.  '  (  •  [x]  ')  '  ] 

K     =  [Conj]  '&•  K  I  [Conj] 
Conj  =  !x  'G'  x2  I 

[F8.  ' ( '  [x]  •) '  ]  '=•  '0'  I 
K5. 
Add   =  P*   =[£'('  [Params]  ')'  ]  ':=' 
[P  '  +  '  '{'  [x]  '} '  ]  ';  ' 
Params=  Param  ' , '  Params  |  Param 
Param  =  !w2  I  w3 . 
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A  few  clarifying  remarks  are  needed  to  explain  this  example. 
The  subpatterns  underlined  within  iter  match  parsed 
expressions  which  must  be  further  reduced.  The  pattern 
variable  #q3   occurring  within  iter  is  treated  as  a 
literal  number  whose  value  is  the  number  of  instances  of  q3. 
occurring  within  the  map  Pfunc  used  to  expand  preD. 
Finally  the  use  P*=   of  the  pattern  variable  P  within  add 
causes  Pfunc (P)   to  be  defined  as  the  root  of  the  subtree 
expanded  from  the  pattern  tree  occurring  just  to  the  right 
of  P*=. 

We  can  use  (12)  and  (13)  to  illustrate  how  we  determine 
a  derivative  for  (8)  relative  to  a  change  s  :=  s  +  p(y) . 
Starting  with  the  map  Pfunc  which  becomes  available  after 
expansion  of  (10) ,  we  will  match  the  pattern  mod  to  the 
change  in  s.   Pfunc  will  reflect  the  successful  match  by 
associating  A  with  p(y) .   Next  we  will  use  Pfunc  to  expand 
PreD.   The  prederivative  code  v/hich  results  is 

(14)  (Vx  G   (p(y)-s) ,  u  e  {v  G  Project(2,h) |x**2  G  h(v) }, 

w  G  g(x),  z  G  f(x)|x  G  Q  &  f(x)  =  0  &  f(g(x+2))  =  0) 
c(u,  w,  z)  :=  c(u,  w,  z)  +  {x}; 
end    V; 

where  u,  w,  and  z  are  new  names  generated  for  the  pattern 
variables  w2,  w31,  and  w32  respectively. 

A  more  formal  description  of  our  pattern  notation  can 
be  found  in  Appendix  C  (iii) .   With  each  pattern  specified 
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using  our  nonprocedural   pattern  language,  we  can  associate 
a  pattern  tree.   The  rules  of  correspondence  between  pattern 
specifications  and  pattern  trees  are  found  in  Appendix  E  (v) . 
The  patterns  specified  in  Appendix  C  (iii)  make  comprehen- 
sive use  of  our  pattern  language.   We  handle  these  patterns 
by  compiling  them  into  pattern  tree  form  and  using  the 
Match  and  Expand  routines  shown  in  Appendix  E  (v) . 

The  major  differences  between  FD  implementations  for 
entirely  continuous  expressions  and  for  discontinuous 
expressions  are  found  in  our  pattern  language.  A  few 
additional  differences  will  be  pointed  out  in  the  following 
discussion  concerning  a  general  FD  implementation. 

A  D  table,  much  like  that  of  the  FD  framework  of 
Chapter  III  can  play  essentially  the  same  role  in  the 
present  formulation.  For  each   elementary  form  f  in  F 
and  every  continuity  parameter  x  of  f,  we  again  use  a 
set  D(x,f)  of  triples  [mod,  preD,  postD]   where  mod  is  a 
parameter  change  pattern,  PreD   stands  for  a  prederivative 
code  macro,  and  postD  a  post  derivative  code  macro.  As 
before,  we  will  still  use  an  F  table  together  with  our  D 
table  to  find  induction  variables,  to  find  an  initial  set 
of  reduction  candidates,  and  to  expand  this  initial  set 
to  a  more  general  set  of  reduction  candidates  by  means 
of  the  following  variant  of  Algorithm  1. 
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Algorithm  1 ' , 

1.  Find  the  set  RC  of  region  constants  and  the  sets  of 
induction  variables. 

2.  Arrange  the  nodes  of  L  in  postorder;  i.e.,  perform 
the  assignment  T  :=  Postorder (n) ;  where  n  is  the  root  node 
of  L  and  Postorder  is  the  SETL  procedure   given  in 
Appendix  E  (iv) . 

3.  For  all  n  e  T  such  that  3 f  g  F  in  which  Match (n, f ,Pfunc) 
holds   perform  step   4. 

4.  Place  n  in  each  of  the  induction  sets  for  which  the 
matched  expression  f  is  an  induction  expression.  Use  a  map 
Reduce  to  record  the  values  of  f  and  Pfunc  at  n  by  executing 
the  assignment  Reduce(n)  :=  [Pfunc, f].   (Reduce  will  be 
used  in  connection  with  the  reduction  routine  Algorithm  2 ' 
to  be  described  later.)   Finally,  mark  n  'reducible'. 

Once  algorithm  1'  has  executed,  all  reducible  expressions 
(with  or  without  discontinuities)   will  be  marked.  Note  that 
the  postordering  in  step  3   ensures  that  we  visit  a  reducible 
expression  e  only  after  first  visiting  all  reducible  sub- 
expressions of  e.  This  is  critical  to  our  algorithm  since 
determining  reducibility  for  e  depends  on  establishing  that 
reducible  subexpressions  of  e  belong  to  appropriate  induction 
sets  --  analysis  which  should  be  done  prior  to  examination 
of  e . 

Also,  observe  that  the  routine  Match  invoked  in  step  3 
is  the  augmented  matching  procedure  given  in  Appendix  E  (v) . 
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It  handles  tree  representations  of  patterns  found  in  F, 
and  performs  various  checks  in  addition  to  the  purely 
syntactic  tree  matching  operations  used  in  Chapter  3  (where 
only  purely  continuous  expressions  are  handled) .  In 
particular  we  allow  Boolean  function  code  procedures  to  be 
included  as  part  of  our  pattern  structures.  When  these 
procedures  are  encountered  during  matching,  they  are  executed 
and  must  return  True    for  matching  to  succeed. 

It  is  worthwhile  to  make  one  more  remark  in  connection 
with  Algorithm  1'.   By  using  the  induction  variable  procedure 
shown  in  Appendix  E  (iv) ,  Algorithm  1'  could  be  implemented 
in  a  largely  language  independent  manner.   However,  this 
procedure  constructs  induction  variable  sets  IV(x,f)  associ- 
ated with  every  continuity  parameter  x  of  every  elementary 
form  f,  and  as  is  noted  in  the  cases  of  Fortran  and  SETL 
(cf..  Chapter  3  (2.3,  3.2)),  this  approach  can  be  too 
costly  in  space.   Thus,  for  each  particular  FD  implementation 
we   prefer  to  work  out  ad   hoa    induction  variable  routines 
(as  we  did  for  Fortran  and  SETL)  which  classify  induction 
variables  into  fewer  categories  than  the  general  procedure 
just  noted.   Moreover,  for  each  separate  implementation  of 
Algorithm  l'  we  may  have  to  redesign  step  4   where  we  classify 
induction  expressions  according  to  the  same  categories 
defined  for  induction  variables. 

Algorithm  1'  prepares  for  the  semimanual  selection 
and  automatic  reduction  of  expressions  found  in  a  program 
loop  L.   This  process  of  selection  and  reduction  is 
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accomplished  by  Algorithm  2 '  whose  logic   varies  only 
slightly  from  that  of  Algorithm  2  of  Chapter  3. 

When  a  user  selects  an  expression  EXP  for  reduction 
by  issuing  a  command 

(15)  $FD,  LOOP#,  NAME  =  EXP 

Algorithm  2'  will  determine  the  discontinuities  of  EXP, 
also  the  arity   of  the  map  NAME  which  will  store  values 
of  EXP,  also  the  code  initializing  NAME  which  becomes 
part  of  the  prologue  of  the  loop  designated  by  LOOP#, 
and  finally  will  determine  the  derivative  code  to  be 
inserted  into  the  loop.   Algorithm  2'  will  also  determine 
which  subexpressions  of  EXP  to  reduce  in  order  to  make 
differentiation  profitable.   Having  said  all   this,  we 
now  outline  Algorithm  2 ' . 

Algorithm  2 ' . 

1.  For  the  command  (15)  to  be  valid,  NAME  must  not  be  an 
existing  program  variable   and  LOOP#  must  refer  to  a  program 
loop  L  which  contains  the  expression  EXP  at  a  node  n  marked 
'reducible'  by  Algorithm  1'. 

2.  If  the  validation  check  (1)  is   passed  we  order  the 
reducible  subexpressions  of  EXP  in  postorder  by  executing 

Cands  :-    [x  s  Postorder (n)  |  Marked (x)  =  'reducible'] 

3.  For  each  node  t   selected  from  Cands,  assign 

Reduce (t) (1)  to  f,  compute  Pfunc  by  executing  Match (t, f ,Pfunc) , 
and  perform  steps  5-8. 

4.  Halt. 
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5.  Generate  a  unique  name  v_  for  the  variable  which  will 

_  f  _ 

keep  the  value  of  f  available  in  L.   If  f  has  discontinuities 

then  v_  will  be  a  map  storing  separate  values  of  f   each 

f 
of  which  gives  constant  values  to  the  discontinuities; 

otherwise,  it  will  be  a  nonmap  variable  storing  only  the 

single  value  of  f.   In  either  case  we  must  initialize  v 

f 
at  the  end  of  the  prologue  for  L. 

6.  For  each  induction  variable  x  in  f  ,  and  for  each 

program  point  p  e  Defs(x)  at  which  the  value  of  x  undergoes 

change,  insert  derivative  code  which  keeps  v_   available 

f 
in  L.   Derivative  code  can  be  generated  in  essentially 

the  same  way  as  in  Algorithm  2  despite  possible  occurrences 

of  discontinuities  within  f  ,  except  that  we  must  use  the 

more  powerful  Match  and  Expand  utilities  given  in  Appendix 

E  (V)  . 

7.  Within  L  replace  each  occurrence  of  f  by  the  map 

retrieval   v_(q,,...,q  )  where  q, ,...,q   are  the  disconti- 

nuity  parameters  of  f ;  however,  if  f  has  no  discontinuities 

it  can  be  replaced  by  a  simple  variable   v_ .   Within  the 

f 
derivative  code  generated  in  step  6,  replace  any  expression 

e  which  has  already  been  reduced  by  an  appropriate   simple 

variable  v  when  e  is  entirely  continuous  or  by  the  map 

retrieval  v  (b,  , .  .  . ,b  )  where  b,,...,b   are  the  disconti- 
e   1      m         1      m 

nuities  of  e.   Next  make  appropriate  additions  to  the  set  RC 
of  region  constants  and  to  the  induction  sets  IV.  After  this, 
mark  each  node  n  'reducible'  if  n  is  introduced  by  the 
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derivative  code  in  step  6  and  requires  further  reduction. 
Reduce  all  such  expressions  using   recursive  application 
of  Algorithm  2'. 

8.    This  last  step  is  identical  to  that  of  Algorithm  2. 

Most  of  the  details  involved  in  actually  implementing 
Algorithm  2'  are  either  straightforward  or  follow  immedi- 
ately from  discussion  of  Algorithm  2.   However,  two  new 

problems  arise.   Initialization  of  the  map  variables  v_ 

f 
used  in  step  5  of  Algorithm  2 '  for  storing  values  of 

discontinuous   expressions  is  more  complicated  and  diverse 

than  initialization  in  the  case  of  completely  continuous 

expressions.   In  step  7,  we  must  use  new  techniques  to 

identify  different   occurrences  of  the  same  discontinuous 

expression   and  to  replace  these  occurrences  by  appropriate 

map  retrieval  operations. 

To  illustrate   the  initialization  problem,  we  consider 

the  following  SETL  example, 

(16)  c(y^,y2)  =  [+:  x  e  f(y^,y2)]l 

where  y   and  y   are  the  only  discontinuity  parameters 
of  (16) .   When  f  is  a  programmer  defined  map  then  initiali- 
zation for  c  involves  a  straightforward  iteration  over 
the  domain  of  f;  i.e., 

(17)  c    :=   null  set; 

{^[b^,b^]    e  Project(2,f) ) 


c(b^,b2)  :=  [X  e  f (b^,b2)]l; 


end    V; 
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Suppose,  on  the  other  hand,  that  f  is  a  compiler 
generated  variable  resulting  from  reduction  of  the  setformer 


(18)      f( 


y-^,y2)  =  {x  e  s|g(x)  =  y^  &  h(x)  =  y^} 


Reduction  of  (18)  will  cause  the  following  initializing 
code, 

(19)  f    :=   null  set; 

(Vw  e  s) 

if    [g(w),  h(w)]  e  Project(2,f)  then 

f(g(w),h(w))  :=  f  (g  (w)  ,h  (w)  ) +{  w} ;  else 
f  (g(w)  ,h(w)  )  :=  {w}; 
endif; 
end    V  ; 

to  be  inserted  at  the  end  of  the  prologue  .for  the  loop  L 
containing  (18).   To  initialize  c,  we  could  certainly  use 
the  code  (17)  inserted  immediately  after  (19)  in  the 
prologue.   However,  it  is  profitable  to  exploit  the  incre- 
mental way  in  which  f  is  defined  in  (19)  in  order  to  produce 
better  initializing  code  than  (17)  for  c.   Essentially, 
we  can  formally  differentiate  c  relative  to  the  changes 
to  f  occurring  in  (19).   Inserting  the  prederivative  code 
for  c  into  (19),  we  come  up  with  the  following  code  which 
initializes  both  f  and  c  together: 
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(20)  c    :=   null  set; 

f    :=   null  set;       ■ 
(Vw  e  s) 

if    [g(w),h(w)]  G  Project(2,f)  then 

c(g(w),h(w))  :=  c  (g  (w)  ,  h  (w)  ) +1  ; 
f(g(w),h(w))  :=  f  (g  (w)  ,h  (w)  ) +{  w}  ;  else 
c (g (w) ,h(w) )  :=  1; 
f (g (w) ,h (w) )  :=  {w} ; 
endif; 
end    V ; 

Moreover,  if  in  step  8  of  Algorithm  2'  we  make  f  dead  on 
entrance  to  the  optimization  loop  L,  then  it  is  easy  to 
eliminate  all  assignments  to  f  and  uses  of  f  within  (20). 
After  we  replace  the  term  Project(2,f)  with  Project (2 ,c) , 
all  assignments  to  f  within  (20)  can  be  removed  as  dead 
code.   Note  that  it  would  not  be  so  easy  to  eliminate  f 
from  the   prologue  in  the  case  of  (17) . 

The  preceding  illustration  of  incremental  initializa- 
tion exemplifies  a  general  initialization  technique  based 
on  the  following  idea.   Whenever  Algorithm  2'  chooses  to   ' 
reduce  an  expression  e,  we  know  that  e  must  be  elementary 
and  that  all  of  its  outermost  reducible  subexpressions  are 
already  reduced.   These  subexpressions  are  initialized  in 
the  loop  prologue  according  to  the  postorder  in  which  they 
are  chosen  for  reduction.   Whenever  it  is  feasible,  we  will 
plan  to  initialize  e  differentially  relative  to  the 
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incremental  initialization   to  the  last  reducible  subexpres- 
sion e '  of  e  occurring  within  the  prologue. 

To  implement  this  method  we  will  use  an  initialization 
code  table  Init.  For  each  elementary  form  f  e  F  we  associate 
a  pair  Init(f)  =  [def,Deriv]  where  def  is  a  pattern  to  be 
used  as  a  code  macro  for  generating  a  straightforward 
initialization  typified  by  (17) ;  Derive  is  a  partially 
defined  map   associating  pattern  variables  x  of  f  with  sets 
Deriv(x)  of  pairs  of  the  form  [mod,PreD].  We  will  then  use 
Deriv   for  differential  initialization  (cf.  the  discussion 
of  (20)),  in  which  the  pairs  [mod,PreD]  serve   the  same 
purpose  as  the  entries  of  the  D  table. 

To  determine  initialization  code  for  an  expression  e 
we  take  the  following  steps:   Suppose  that  Match (e , f ,Pfunc) 
holds.   Then  we  execute  the  SETL  assignment 

[def,  Deriv]  :=  Init(f) 
If  e  has  no  reducible  subexpressions,  or  if  its  outermost 
reducible  subexpression  e'  (whose  initialization  code  occurs 
last  in  the  prologue)  is  matched  by  a  pattern   variable  x 
of  f  in  which  x  ^  Dom   Deriv,  proceed  as  in  case  1  below; 
otherwise  proceed  as  in  case  2. 

1.  For  this  case,  we  use  the  nonincremental  approach. 
This  requires   insertion  of  the  code  generated  by 
Expand (def ,Pfunc)  at  the  end  of  the  prologue. 

2.  To  initialize  e   differentially,  we  first  suppose 
that  reduction  of  e'  generates  the  map  variable  v^ ,  which 
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holds  values  of  e'.   Then  within  the  initialization  code 
for  e',  at  each  point  p  where  v  ,  varies  we  must  find  a 
unique  pair  [mod,preD]  belonging  to  the  set  Deriv(x)  such 
that  mod  matches  p  using  Pfunc,  and  we  insert  the  prederiva- 
tive   code  Expand (preD, Pfunc)  just  prior  to  p. 

In  Appendix  C  (iii) ,  we  show  an  Init  table  included 
as  part  of  our  SETL  FD  implementation  to  be  discussed  in 
the  next  section.   Note  that  in  this  table,  out  of  eleven 
basic  forms  contained  in  our  F  table  (also  shown  in 
Appendix  C  (iii)),  only  Init(FormlO)   and  Init(Formll)   have 
nonnull   second  components.   Nevertheless,  the  SETL  case 
studies  of  Section  D  and  Appendix  F  suggest  that  the 
differential  initialization  capability  just  mentioned  has 
widespread  utility.  i 

It  is  of  considerable  importance  for  Algorithm  2'  to 
be  able  to  find  all  elementary  reducible  expressions  in  L 
whose  values  can  be  stored  using  the  same  variable. 
(Note  that  the  variables  which  hold  values  of  reduced 
expressions  are  generated  by  macro  expansion  during  the 
initialization  phase  of  FD.)   We  call  such  expressions 
'similar',  and  can  replace  occurrences  of  similar 
expressions  by  occurrences  of  the  same  simple  variable  or 
of  map  retrieval  terms  using  the  same  map  variable. 

Similar  expressions  which  involve  no  discontinuities 
can  differ  from  each  other   only  in  the  names  of  bound 
variables.   Similar  expressions  involving  discontinuities 
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can  also  differ  in  their  discontinuity  subexpressions . 
For  example,  the  SETL  expressions 

(21)  {x  e  s  I  f(x)  =  q  *  t} 

and 

(21')  {y  e  s  I  f  (y)  =  p  +  b} 

are  similar  when  s  and  f  are  induction  variables  and 
q  *  t  and  p  +  b  are  discontinuity  subexpressions.  If 
we  choose  to  reduce  (21)  in  a  loop  L,  then  we  will  insert 
initialization  code  for  (21)  within  the  prologue  to  L 
and  derivative  code  for  (21)  within  L.   If  C  is  the  map 
used  to  keep  (21)  available  within  L,   then  we  can  replace 
occurrences  of  (21)  within  L  by  occurrences  of  the 
retrieval  C(q  *  t) .   But  since  (21')  is  similar  to  (21) 
we  can  also  replace  occurrences  of  (21')  by  C(p  +  b). 

To  formulate  a  general  method  which  locates  similar 
expressions  and  replaces  them  with  simple  variables  or  map 
retrievals,  we  note  first  of  all  that  each  expression  e 
which  Algorithm  2'  selects  for  reduction  is  elementary, 
a  fact  which  simplifies  our  task.   Suppose  that  an  expres- 
sion e  selected  for  reduction  by  Algorithm  2'  is  matched 
by  the  elementary  form  f  and  that  the  pattern  variable  map 
Pfunc  is  obtained  by  executing  Match (e, f, Pfunc) .  We  define 
the  decomposition    of  e  as  the    pair  [f, Pfunc].   Let  e' 
be  some  other  elementary  reducible  expression  having  the 
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pair  [f ,Pfunc']  as  its  decomposition.   Then  e'  and  e  are 
similar  iff  the  following  conditions  hold: 

(22)  1.   f  =  f 

2.  Dom   Pfunc  =  Dom   Pfunc' 

3.  Let  Bnd(f)  be  the  set  of  pattern  variables  of  f 
which  match  to  bound  variables  of  an  expression  f. 
For  each  nondiscontinuity  parameter  x  belong  to 
Bom   Pfunc,  let  t  be  a  tree  formed  from  Pfunc (x) 
using  the  following  substitutions:  For  each 

y  s  (Bnd(f)  *  [Dom   Pfunc')),   replace  all  occur- 
rences of  subtrees  Pfunc' (y)  within  Pfunc' (x)  by 
the  subtree  Pfunc  (y)  .   After  all   of  this  is  done. 
Equals (Pfunc (x) , t)   must  hold,  where  Equals  is 
the   tree  equality  test  given  in  Appendix  E  (ii) . 

Moreover,  if  we  assume  that  two  expressions  g  and  h  decompose 
into  the  pairs  [f, Pfunc  ]   and  [f, Pfunc,  ]  ,  we  can  test  for  simi- 
larity  between  g  and  h  by   executing  Sim (f,Pfunc„ , Pfunc,  )  , 
where  Sim  is  a  boolean  valued  procedure  which  returns  True 
if  conditions   2   and   3   of  (22)  both  hold. 

Using  the  test  (22)  for  similarity  we  can  locate  all 
expressions  similar  to  an  expression  e  which  is  chosen  by 
Algorithm  2'  for  reduction.   Then  in  step   7   of  our  algo- 
rithm, we  can  replace  all  occurrences  of  e  and  expressions 
similar  to  e  by  occurrences  of  appropriate  simple  variables 
or  map  retrievals.   Let   Similar (e)  be  the  set  containing  e 
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and  also  containing  all  reducible  expressions  similar  to  e. 

Suppose  that  e  is  matched  by  basic  form   f  e  f ;  suppose 

also  that  each  expression  g  e  Similar (e)  can  be  decomposed 

into  the  pair  [f ,Pfunc  ] .   To  each  map  Pfunc   we  then  add 

the  pair  [E,t]  (generated  by  expansion  during  initialization 

for  e)  in  which  E  is  the  pattern  variable  whose  name  is 

Label (t)  used  to  keep  values  of  e  available.   Then,  using  a 

function  Replace  which  associates  each  elementary  form  h  6  F 

with  a  replacement  macro  Replace (h),  we  can  replace  each 

occurrence  of  every  expression  g  €  Similar (e)  with  the  term 

Expand (Replace (f ) ,Pfunc  ).   (Cf.,  the  Replace  patterns  in 

y 

Appendix  C  (iii) .) 

Using  the  test  for  similarity  that  has  just  been 
described,  we  can  also  keep  track  of  reduced  expressions, 
and  avoid  redundant  reduction  of  similar  expressions.  This 
is  achieved  by  maintaining  a  set  'Reduced'  of  nonsimilar 
reduced  expressions  e   each  of  which  is  represented  by  its 
decomposition  [f, Pfunc  ].   The  first  time  Algorithm  2'  selects 
an  expression  e  for  reduction  (in  step   3  )  the  set  'Reduced  ' 
is  initialized  to  {  [f, Pfunc  ]}   where  f  is  the  elementary 
form  matching  e,  and  Pfunc   is  the  pattern  variable  map 
obtained  by  matching  f  to  e .   Each  subsequent  time  step   3 
is  performed,  before  choosing  an  expression  e'  (whose 
decomposition  is  [f, Pfunc'])  for  reduction,  we  check 
whether  e'  is  similar  to  an  expression  already  reduced.  This 
can  be  done  by  performing  the  test 
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(23)   HPfunc   e  Reduced{  f  }  I  Sim  (f  ,Pf unc  ,  ,Pfunc  ) 

g  I  ^        '  g  I  '  g' 

If  the  value  implied  by  (23)  is  true,  we  must  perform  the 

indexed  assignment   Pfunc  ,(E)  :=  Pfunc  (E)  which  retrieves 

from  Pfunc   the  name  E  of  the  variable  which  can  keep 

e'  available,  and        stores  this  name  into  Pfunc  ,. 

e  '       ■ 

Then,  we  skip  to  step   7   of  Algorithm  2'   and  replace  all 
occurrences  of  e'  by  occurrences  of  Expand (Replace (f ') ,Pfunc  ,) 
For  the  case  when  (23)  does  not  hold,  we  proceed  to  step  5 
where  we  add  the  pair  [f', Pfunc  ,]   to  Reduced  after  first 
performing  all  the    other  subtasks  specified  for  this 
step. 

We  conclude  this  section  by  describing  some  heuristics 
for  constructing  a  table  F  of  elementary  forms  which  can  be 
included  as  part  of  an  FD  implementation  for  a  programming 
language  P.  These  are  as  follows: 

1.  Define  some  minimal  set  E  of  applicative  expressions 
in  P,  where  each  expression  in  E  contains  no  more  than  one 
occurrence  of  the  same  variable. 

2.  For  each  expression  f  e  E  determine  a  set  DS(f)  of 
data  structures  each  capable  of  storing  values  of  f. 

3.  For  each  expression  f(x, ,...,x  )  belonging  to  E,  for 
each  data  structure  d  e  DS(f),   and  for  each  variable  x., 

i  =  l,...,n   determine  empirically  the  kinds  of  modifications 

to   x.  in  which  the  value  of  f  stored  in    data   structure 

d  can  be  updated  at  a  cost  which  is  much  less  than  the  cost 

of  a  fresh  calculation  of  f.   Let  Cont  be  the  set  of  pairs 
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[x.,mod]  where  mod  is  a  pattern  representing  a  distinct 
kind  of  change  in  x.  on  which  f  depends  continuously. 

4.  For  each  pair  p  :=  [x,mod]  belonging  to  Cont,  let 
Removable (p)  denote  all  the  variables  y  of  f  (where  y   ^   x) 
which  we  know  to  be  removable  discontinuity  variables  rela- 
tive to  the  change  mod  in  the  variable  x. 

5.  Then,  proceed  in  the  manner  indicated  in  the  following 
code  to  compute  a  set  'Forms'  containing  pairs  each  of 
which  can  be  used  to  construct  an  elementary  expression 
form  for  our  F  table: 


Forms  :=  nullset ; 
(Vp  G  Dom   Removable)   /*  Initialize  Forms  */ 

Forms  with    [{p}.  Removable  (p)  ]  ; 
end    V; 
(while   3  t  s  Forms ,  q  G  Forms  |  t  7^  q  & 

(t(l)  +  q(l))  ^  Dom    Forms) 
Forms  :=  Forms  -  { [Conts ,Disconts]  G  Forms  | 

(t(l)  +  q(l)  Incs    Conts  & 
(t(2)  *  q(2))  =  Disconts} 
+  [t(l)  +  q(l) ,  t(2)  *  q(2) ] }; 
endwhi le ; 

6.    Having  computed  Forms  in  the  preceding  step,  we  can 

construct  an  elementary  pattern  form  g  (y.^^ ,  ,  .  .  ,y^)  for 

each  pair  [Conts,  Disconts]  belonging  to  Forms.   We 

construct  this  pattern  g  using  the  same  tree  structure 

and  literal  symbols  as  occur  in  f.   However,  for  i  =  l,...,n 

y.  is  treated  as  a  pattern  variable  restricted  according 
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to  the  following  three  cases: 

(i)     If  X.  belongs  to  the  set  Disconts,  then  y.  is  a 
discontinuity  parameter. 

(ii)    If  X.  is  contained  in  Dom   Conts ,  y.  is  a  continuity 
parameter,  and  the  value  of  any  variable  y.  matched  by  y. 
can  only  vary  according  to  the  modification  patterns  found 
in  Conts{  x  .  }  . 

(iii)   If  neither  (i)  or   (ii)  applies,   y.  must  match 
a  region  constant. 
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(C)   Implementation  Design  for  SETL 

In  this  section  we  build  an  implementation  design  of 
semiautomatic  FD  for  SETL  based  on  the  general  remarks 
of  the  preceding  section.   This  design  incorporates 
several  of  the  transformations  studied  in  Chapter  II  (D) 
and  regards  differentiation  of  general  set  formers 

(1)  {x  G  s  I  K(x,tT  ,  .  ..,t  )  } 

i      n 

as  being  of  primary  importance. 

As  noted  in  Chapter  II  (D) ,  the  cost  of   executing 
(1)  repeatedly  in  a  loop  L  is  proportional  to 
NX  (#  S)  X  Cost(K) ,  where  N  is  the   iteration  count  of  L. 
The  FD  transformations  applied  by  our  system  will  keep 
the  value  of  (1)  available  in  L  in  either  (N  +  #S)  x  Cost(K) 
or  (N  +  (#S)  X  log  (#S))  Cost(K)   elementary  steps; 
and  this  will  usually  imply  a  speedup. 

The  FD  design  to  be  described  in  this  section  also 
aims  to  minimize  the  number  of  elementary  reducible  forms 
in  F  and  the  variety  of   transformations  embedded  in  the 
D  table  without  sacrificing  power.  This  is  achieved  by 
using  the  single  form  (1)  to  handle  set  intersection,  set 
difi-rcri-c,  .ui'i  oHr  )-  s'-.t  llicoieLic   perations .  Moreover, 
we  exploit  the  fact  that  whenever  iterative  expressions 
such  as  the  arithmetic  sum 


[+:  X  e  s  I  K(x,q  , . . . ,q  ) ] e (x) 


453 


are  reducible,  we  can  also  reduce  (1) .   But  instead  of 
using  exhaustive  elementary  form  and  derivative  entries 
for  both  (1)  and  d'),  we  only  need  to  specify  all  the 
entries  related  to  (1)  and  the  D  table  entries  associated 
with  the  following  two  additional  elementary  expression 
forms , 

[+:  X  G  s]e(x)   and   [  +  :  x  G  K(q  ,,,,,q  )]e(x) 

for  (1').     Thus,  the  F  table  given  in  Appendix  C(iii) 
contains  only  11  elementary  forms. 

To  differentiate  expressions  in  a  Subsetl  program  P, 
we  must  preprocess  P  by  transforming  expressions  of  P 
into  one  of  the  11  basic  forms  of  F.   Many  of  the 
'preparatory'  transformations  used  for  this  purpose  are 
given  in  Appendix  D  (v) .   Since  one  FD  transformation 
can  potentially  lead  to  another,  the  code  produced  by  FD 
will  contain  expressions  forced  to  match  to  the  elementary 
forms  of  F,  and  will  lack  a  readable  quality.  Thus,  after 
FD  is  done,  we  will  want  to  sweep  up  the  leftover  trans- 
formational debris  by  applying  a  battery  of  cleanup 
transformations  (cf .  Appendix  D  (vi)  for  examples) . 

For  simplicity  we  will  treat  both  preparatory  and 
cleanup  transformations  semiautomatically .  The  sample 
user/system  interaction  for  manipulating  the  Topological 
Sort  program  in  Chapter  III  (B)  exemplifies  the  effort 
required  to  apply  transformations  preparing  for  FD  and 
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cleanup  transformations  afterwards. 

To  handle  discontinuous  SUBSETL  expressions,  our  FD 
design  will  use  the  following  mechanisms: 

1.  The  procedure  Regconst  given  in  Appendix  E  (iv)  for 
determining  the  set  RC  of  region  constants  in  L. 

2.  The  routine  ISETL  which  is  the  same  procedure  as 
Algorithm  1 '  of  the  preceding  section  except  for  minor 
adjustments  described  later  in  this  section.  ISETL  deter- 
mines the  set  Cands  of  reducible  expressions  in  L. 

3.  The  reduction  procedure,  2SETL  (essentially  the  same 
as  Algorithm  2 '  given  in  Section  B)  for  reducing  members 
of  Cands  in  L. 

We  will  also  make  use  of  the  following  components  tailored 
exclusively  to  Subsetl: 

4.  The  F,  D,  Init,  and  Replace  tables  given  in 
Appendix  C(iii). 

5.  A  procedure  for  determining  sets  of  induction  variables 
for  Subsetl.   (For  the  sake  of  efficiency,  we  avoid  using 
the  language  independent  induction  set  procedure  found  in 
Appendix  E  (iv)  .) 

Since  parts  1-4  of  our  FD  design  are  discussed  else- 
where, we  begin  with  remarks  about  5.   Since  a  decision 
procedure  for  induction  variables  depends  on  knowing  the 
types  of  variables,  we  first  perform  a  type  analysis.  This 
can  be  based  on  either  Tenenbaum's  method  [Tl]  or  on  some 
system  of  type  declarations.  Observation  of  the  11  elementary 
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expression  forms  in  F  and  their  associated  derivatives  in 
the  D  table  indicates  that  only  seven  sets  of  induction 
variables  are  needed.   These  may  be  defined  as  follows: 

IV^  =  {all  set  variables  x  that  only  undergo  changes  of 
the  form,  x  :=  x  +  A}; 

IV„  =  {all  map  variables  f  experiencing  only  indexed 
assignments  in  L}; 

IV_^  =  {integer  variables  x  which  only  change  according 
to  the  rule,  X  :=  x  +  A}; 

IV.  =  {all  set  valued  maps   f(q  ,...,q  )  which  only  have 
definitions  of  the  form 
f(q^,...,q^)  :=  f(q^,..,,q^)  +  A}; 

IV J-  =  {positive  integer  valued  1-ary  maps  f  (x)  only  under- 
going modifications  f(x)  :=  f(x)  +  A},  where  A  is 
a  positive  integer}. 

IV^  =  {all  set  variables  x  that  only  undergo  'strict'  set 
o 

additions  and  deletions,  x  : =  x  +  A } ; 

IV   =  {all  set  valued  maps  f(q,,...,q  )  that  only  have 

definitions  of  the  form  f(qw..-/qa.)  :=  f  (q-,  /  •  •  • /go^)  +  A 
which  are  'strict'};  . 

Once  these  seven  induction  sets  have  been  calculated, 
we  can  find  all  the  reducible  expressions  in  L  by  using 
ISETL,   Although  ISETL  closely  follows  the  logic  of 
Algorithm  1',  there  are  differences  at  step  4  in  which 
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inductions  expressions  are  determined.   In  this  step  ISETL 
will  classify  the  nodes  n  which  correspond  to  reducible 
expressions  into  the  sets  described  above,  according  to 
the  following  cases  which  arise  for  Subsetl: 

1.  n  is  matched  by  Forml   of  the  F  table.   In  this  case, 

if  there  are  no  discontinuities  place  n  in  IV,  and  IV, ; 

6       1 

otherwise  place  n  in  IV_  and  IV . . 

2.  n  is  matched  by  Forml,  Form2 ,  ...,  or  FormV . 
Place  n  in  IV_  and  also  IV.. 

3 .  n  is  matched  by  Form  8 .   n  goes  into  IV,  and  IV, . 

4.  n  is  matched  by  Form  9.   Insert  n into  IV^. 

5.  n  is  matched  by  Form  10.  Put  n  into  IV^.. 

6.  n  is  matched  by  Form  11.   Add  n  to  both  IV.  and  IV_ . 


ISETL  must  also  take  an  extra  precautionary  measure  in 
order  to  recognize  expressions  e  which  depend  on  a  set  or  tuple 
valued  variable  x  having  multiple  occurrences  in  e. 
As  an  example  of  such  an  expression,  consider 

(3)  c  =  {x  G  s  I  f(x)  f   s} 

occurring  in  a  program  loop  L  in  which  f  is  invariant 
and  s  only  varies  by  set  additions  of  the  form  s  :=  s  +  A. 
In  accordance  the  method  of  Chapter  3  (C) ,  we  first 
number  the  two  occurrences  s^  and  s„  of  s  within  (3) . 


Next  we  examine  the   prederivative  code  preD,  and  preD2 
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for  (3)  relative  to  the  changes   s   :=  s,  +A   and  s^  :=  s^+  A 
respectively.   The  D  table  of  Appendix  C  (iii)  shows  that 
there  are  two  cases  to  consider  —  that  in  which  the  set 
addition  s  :=  s  +  A  is  strict  (i.e.,  the  predicate 
s  *  A  =  nullset      holds  just  prior  tP  the  change  in  s) 
and  that  in  which  it  is  not  strict.   In  the  first  case, 


preD,  and  preD„   are 


(4)       /*  preD^  */ 


(Vx  e  A  I  f(x)  ^  S2) 

c  :  =  c  +  {  X }  ; 
end      V 


and 


(4')      /*  preD^  */ 


(Vy  G  A,  X  6  {u  e  x^  |  f(x)  =  y}) 

c  :  =  c  -  {  X }  ; 
end    V  ; 


In  the  latter  case,  preD,  and  preD„  are  given  by 


(5)       /*  preD^  */ 


(Vx  e  (A  -  s^)  I  f(x)  9-    s^) 

c  :  =  c  +  {  X }  ; 
end    V  ; 


and 


(5')      /*  preD2  */ 


(Vy  G  (A-  s^),  X  G  (u  e  s-j_|f(x)  -  y} 

c  :-    c    -    { X } ; 

end    V ; 
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In  both  of  these  two  cases  we  seek  to  arrange  preD^ 
and  preD^  together  with  s  :=  s  +  A   in  an  appropriate 
order  which  makes  it  convenient  to  replace  all  occurrences 


of  s,  and  s^  within  preD,  and  preD-   by  occurrences  of  s. 
(Recall  that  this  is  an  optimal  form  of  (6),  Chapter  3(C).) 
In  the  first  case,  this  can  be  achieved  by  surrounding 
the  code  s  :=  s  +  A   by  (4)  and  (4')  in  any  order. 
However,  since  (5)  and  (5')  both  depend  on  s,  and  s„  ,  we 
cannot  find  an  optimal  derivative  code  placement  which 
avoids  an  extraneous  and  potentially  costly  copy  operation. 
Thus,  when  the  latter  case  arises  we  will  not  want  to 
mark  (3)  'reducible'. 

Setformers   such  as  (3)  are  used  widely  enough  in  SETL 
algorithms  to  warrant  a  few  additional  remarks.  It  is 
possible  to  reduce  (3)   by  taking  either  of  the  following 
two  approaches . 

1.  Choose  less  efficient  or  less  desirable  derivative 
code  which  makes  use  of  fewer  parameters  than  the  standard 
derivative  code. 

2.  Recognize  that  uses  of  parameters  within  derivative 
code  may  be  eliminated  by  transformation. 

The  first  approach  allows  us  to  reduce  (3)  by  noting 
that  the  derivative  code  (4)  and  (4*)  can  be  used  to  replace 
(5)  and  (5').   However,  if  we  do  this,  c  will  no  longer 
belong  to  IV   ,  and  an  outer  expression   such  as  [+:  x  e  c] 1 
which  depends  on  c  may  require  more  complicated  reduction. . 
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The  second  approach  avoids  the  disadvantages  of  the 
first,  but  is  trickier  to  apply.   In  the  case  of  expression 
(3),  we  can  observe  that  the  occurrence  of  s,  in  (5')  is 
contained  within  the  setformer  c'(y)  =  {u  e  s^  |  f (x)  =  y} 
which  must  be  reduced.   Consequently,  after  reduction  of  c ' , 
the  unwanted  occurrence  of  s^  will  disappear.   Thus,  we 


can  treat  preD,  as  involving  s   and  s^  and  preD-  as  involv- 
ing just  s„.   A  profitable  derivative  for  (3)  can  then  be 


formed  by  using  preD,  followed  by  preD^   which  precedes 
the  change  to  s.   Moreover,  we  must  place  the  derivative 


code  for  c'(y;  in  between  preD,  and  preD„ . 

We  will  consider  the  two  approaches  just  described 
as  future  possibilities  for  an  expanded  FD  system.   In 
any  case,  we  will  abide  by  the  following  general  rule:  If 
an  expression  e  is  matched  by  a  basic  form  f  more  than  one 
of  whose  continuity  parameters  match   different   occurrences 
of  X  in  e,  then  we  will  consider  e  undif ferentiable  if 
our  reduction  techniques  (cf.,  the  discussion  following  (3) 
of  Chapter  3(C))   would  fail  to  remove  costly  copy  opera- 
tions , 

(2)  Xqlj^  :=  X 

from  the  derivative  code  for  e.  Although  we  have  not 
provided  a  general  decision  procedure  for  doing  this  which 
works  better  than  the  obvious  exponential  method,  expres- 
sions involving  multiple  occurrences  of  a  single  variable 
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are  apt  to  be   uncoinmon . 

Once  all  reduction  candidate  expressions  are  found 
in  L,  a  user  can  select  a  particular  candidate,  EXP,  for 
reduction  by  issuing  the  following  command. 

(6)  $FD, LOOP #, NAME  =  EXP  . 

This  command  will  be  passed  to  the  formal  differentiation 
transformation  generators,  and  will  be  processed  by  2SETL 
(which  is  essentially   a  renaming  of  Algorithm  2'  of  the 
preceding  section)  using  the  Subsetl  tables  F,  D,  Init, 
and  Replace   of  Appendix  C  (iii),  the  sets  RC ,  IV^ , IV„ , . . . , IV^ 
and  the  map   Reduce   generated  by  ISETL. 

In  the  next  section  we  will  illustrate  our  SETL  FD 
design  by  presenting  four  program  derivations  as  case 
studies.   However,  before  doing  this  it  may  be  helpful  to 
make   a  few  explanatory  comments  about  the  SETL  F  and  D 
tables  given  in  Appendix  C  (iii)  . 

Because  the  setformer   is  a  fundamental  building  block 
used  to  construct  base   forms  of  algorithms,  the  most 
important  elementary  form  appearing  in  the  F  table  is  the 
setformer   pattern   Forml .   Forml    matches  generalized  set 
formers   which  cover  many  of  the  set  former  constructs 
studied  in  Chapter  2  (D) .   The  boolean  subparts  of  set 
formers  matched  by  Forml  must  consist  of  a  conjunction  of 
terms  T,  each  of  which  is  matched  by  a  conjunct  pattern 
used  in  the  definition  of   Forml.    Recall  from  Chapter  2(D) 
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that  differentiation  of  set  formers  is  sometimes  handled 
differently  for  the  two  cases   when  a  particular  kind  of 
conjunct  occurs  only  once  or  when  it   occurs  several 
times  within  T  (cf.  (49)  of  Chapter  2  (D) ) .   However,  we 
have  observed  that  whenever  such  a  distinction  is  made 
for  a  conjunct  pattern  p,  we  can  handle  multiple    occur- 
rences  of  terms  which  match  to  p  by    preparatory 
transformations  which  reduce  these  multiple  occurrences 
into  a  single  occurrence.   We  enforce   this  procedure 
by  designing  Forml  to  allow  p  to  match  successfully  no 
more  than  once.   This  is  done  by  placing  the  special 
pattern  operator   !  just  prior  to  p  within  the  definition 
of  Forml . 

To  illustrate  the  technique  just  mentioned  consider 
the  following  set  former, 

(7)    (x  e  s|f,(x)  e  Q   &  f.(x)  g  Q„  &  .  .  .  &  f  (x)  g  q  } 
i        L  z  z  n  n 

When  n  =  1,  ISETL  can  recognize  (7)  as  a  dif f erentiable 
expression  matched  by  Forml.    However,  for  n  >  1,  (7) 
will  not  be  marked  'reducible'.  Thus,  to  reduce  (7)  we 
must  first  manually  select  transformations  which  rename 
f^(x)  and  Q.  as  'shadow  variables'   f'(x,i)  and  Q'(i), 
i  =  l,...,n.   Next  we  transform   the  conjunction 
f'(x,l)  G  Q(l)  &  ...  &  f'(x,n)  G  Q'(n)  into  the  following 
intermediate  form. 
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(8)      [  +  :  1  ^  i  <_  n  I  f  (x,i)  ^  Q(i)]l  =  0 

which  passes  into  the  dif f erentiable  form 

(8')      [  +  :  ye{l^i<_n|f(x,i)  ^Q(i)}]l  =  0  . 

(Cf.   (45),  (45'),  (47),  (47')  and  (49)  of  Chapter  2(D) 
for  further  discussion.) 

Although  the  FD  tables  of  Appendix  C  (iii)  allow  us 
to  handle  a  wide  variety  of  expressions  and  algorithms, 
these  tables  are  by  no  means  complete;  indeed,  our  F 
table  omits  several  elementary  forms  which  generalize 
expressions  studied  in  Chapter  2  (D)  (cf.,  (49)  of 
Chapter  2  (D) ) ,  and  our  D  table  lacks  entries  which  could 
handle  Rule  2.   These  omissions   simplify  our  initial 
implementation  design,   and  can   be  easily  remedied 
in  the  future . 
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(D)   Applications  of  Formal  Differentiation  to 
Algorithm  Development 

To  come  to  terms  with  the  question  of  how  automatical- 
ly FD  can  be  applied,  we  consider  a  simple  example  — 
Knuth ' s  Topological  Sort.   (This  example  is  also  studied 
by  Earley,  [El].)   The  input  assumed  by  this  algorithm  is 
a  set  s  and  a  set  of  pairs  sp   representing  an  irreflexive 
transitive  relation  defined  on  s;  as  output,  it  produces 
a  tuple  t  in  which  the  elements  of  s  are  arranged  in  a 
total  order  consistent  with  the  partial  order  sp.   A 
concise  SETL  form  of  the  algorithm  is  as  follows: 

(1)  t  :=  nulltuple ; 

(while  3a  G  s  I  (sp{a}  *  s)  =  nullset) 

t  :=  t  +  [a] ;     /*  tuple  concatenation  */ 
s  :  =  s  -  { a }  ; 

end  while; 

In  Chapter  3  (B)  we  showed  how  a  user  of  our  proposed 
system  could  transform  (1)  semimanually  into  the  following 
form  which  is  better  suited  to  FD : 

(2)  1    t  :=  nulltuple 

2  {while    3a  e  {xe  sl[+:  y  G  {z  e  sp{x}| zes}]l  =  0}) 

3  t  :=  t  +  [a]  ; 

4  s:=s-{a}; 

5  end   while; 

At  this  point  a  user  could  differentiate   (2)  by  issuing 
the  command, 
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(3)   $FD,  2,  Zrcount  =  {xes|[+:  y  e{z  g  sp{x}lz  G  s}]l  =  0} 


The  system  will  have  computed  the  sets  RC  =  {sp} 
and   IV   =  {s}.   Algorithm  ISETL  will  have  marked   the 
following  expressions  reducible: 


and 


c, (x)  =  {z  G  sp{x}|z  G  s}, 
c^  (x)  =  [+:  y  G  Cj^(x)  ]1  , 
c^  =  {x  G  s  I  C2 (x)  =0}  . 


Expression  c, (x)  matches  basic  form  11  of  the  F  table 
(cf.  Appendix  C(iii)),  c^ (x)  corresponds  to  form  10, 
and  c^  is  of  the  first  basic  form. 

Algorithm  2SETL,  invoked  by  the  user  directive  (3), 
will  first  validate  this  directive.  Then,  Zrcount  will  be 
reduced,  starting  from  its  inner  and  proceeding  to  its 
outer  reducible  subexpressions.  To  reduce  the  innermost 
subexpression  c, (x) ,  the  system  needs  to  differentiate  c^ 
with  respect  to  the  change   s  :=  s  -  {a}   occurring  at 
line  4  of  (2),  and  also  needs  to  initialize  c   just  prior 
to  line  2.  The  system  will  use  entry  lib   in  the  D  table 
to  generate  the  following  prederivative  code, 

(4)   (Vy  G  ({a}  *  s) ,  u  G  {z  G  Dom   sply  G  sp{z}}) 
c,  (u)  :=  c^ (u)  -  {y} ; 

end   V ; 

The  following  initializing  code  will  also  be  obtained  from 
the  I  nit  table  entry   11a: 
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(5)   (Vx  e  Dom    sp)  c, (x)  :=  {z  e  sp{x}  |  z  G  s}; 
end   V ; 

However,  since  the  derivative  table  entry  lib  stip- 
ulates that  the  subexpression  c. (y)  =  {z  s  Dom   sp|y€sp{z}} 
of  (4)  must  also  be  reduced,  the  system  will  insert  the 
following  initializing  code  for  c.  at  the  end  of  the 
prologue  to  the  while   loop  L, 


(6 )   c  .  :=  nullset ; 

(Vz  e  Dom   sp,  y  e  sp{z}  |  z  £  Dom    sp) 
-Cf  Y   G  Dom   c.    then 

c^(y)  :=  c^(y)  +  {z};  else 
c^(y)  :=  {z}; 
endif; 
end    V; 

(Note  that  (6)  is  based  on  entry  1  of  the  Init  table.) 
No  derivative   code  for  c.  is  required,  however,  because 
c^  is  invariant  in  L.   The  system  will  request  the  user 
to  supply  a  variable  name  for  c . .   After  all  this   the 
code  sequence  (4)  will  be  transformed  into  a  more  effi- 
cient form. 


(7)   (Vy  G  ({a}  *  s) ,  u  G  c^ (y) 


c  (u)  :=  c  (u)  -  {y} ;    /*  strict  deletion  */ 


end   V ; 
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Next,  proceeding  from  inner  to  outer  subexpression 
of  Zrcount,  the  reduction  procedure  aims  to  differentiate 
c„ (x) .   Since   c_  only  depends  on  the  change  in  c, (u) 
which  occurs  within  (7) ,  entry  10b  of  the  D  table  is 
applicable  and  yields  the  special  prederivative , 

(8)         C2(u)  :=  c^Cu)  -  [+:  X  e  {y}]l 

This  exploits  the  fact  that  the  element  deletion  within  (7) 
is  strict,      where  we  say  that  set  addition  x  :=  x  +  A 
(or  set  deletion  x  :=  x  -  A)   is  strict      if  the  precondi- 
tion A  n  X  =  0  (respectively,  x  ^  A)  holds. 

The  system  will  now  detect  that  c,  has  no  uses  within 

(8)  and  can  therefore  be  eliminated.  The  initializing  code 
for  c„    (obtained  from  entry  10a  of  Init)  ,  which  is 

(9)  (Vx  e  Dom    sp)  c^ix)     :=    [+:  y  €  {z  G  sp{x}|z  e  s}]l; 
end    V  ; 

replaces  (5),  and  the  assignment  to  c, (u)  within  (7)  is 
removed. 

Finally,  c^  is  prepared  for  reduction.   Its  prederiva- 

tives ,  which  are  inserted  within  L,  are 

(10)  (Vx  e    ({a}  *  s)  I  C2(x)  =  0)     /*  change  to  s  */ 

C3  :=  c^  -  {x}; 
end    V  ; 

(cf . ,  lb  of  D)   and 
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(10')       if   u  e    s    8c    C2(u)  =  [  +  :  X  €  {y}]l  then 

c-  :=  c^  +  {u};     /*  change  to  C-  */ 

endif 

(cf . ,  Ihh  of  D) . 

Since  both  (10)  and  (10')  contain  uses  of  c„  ,  c„  will 
not  be  eliminated.   Consequently,  the  system  will  request 
the  user  to  supply  a  variable  name  for  c„.   It  will  also 
initialize   c.  by  inserting  the  following  assignment 
(based  on  entry  1  of  Init) , 

(11)  C3  :=  {x  e  s  I  C2(x)  =  0}; 

at  the  end  of  the  prologue  to  L. 

If  we  assume  that  the  user  supplies  the  name  succ 
and  count  for  c.  and  c_   respectively,  then  the  following 
much  improved  version  of  the  topological  sort  (2)  will  be 
produced  by  one  user  directive  (3): 

(12)  1      t  :=  nulltupZe ', 

/*   prologue  */ 

2  succ  :=  nullset ;  /*  successor  map  */ 

3  (Vz  S  Dom   sp ,  y  s  sp{z}  |  z  G  Dow    sp) 

4  if  Y  ^   Dow   succ  then 

5  '.6l\':c(_/j     :-^    £-'i'jr  ^y)  '-  {z}-  elne 

6  succ (y)  :=  {z} ; 

7  endif; 

8  end    1 ; 

9  (Vx  G  Bom    sp)  count(x)  :=  [+:  ye{zesp{x} | z^s) ] 1 ; 

10  end    V 

11  Zrcount  :=  {x  e  s  |  count (x)  =  0 } ; 
/*  main  loop  */ 
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12  {while    3a  G  zrcount) 

13  t  :=  t  +  [a] ; 

14  (Vy  £  ({a}  *  s) ,  u  e  succ(y)) 

15  if   J.  ^    s    Si    count(u)  =  [  +  :  x  e  {y}]l  then 

16  zrcount  :=  zrcount  +  {u}; 

17  endif 

18  count{u)  :=  count(u)-[+:  x  e  {y}]l; 

19  end    V; 

20  (Vx  G  {{a}  *  s)  I  count(x)  =  0) 

21  zrcount  :=  zrcount  -  {x}; 
2  2  end    1 ; 

23  s:=s-{a}; 

24  end   while; 


The  topological  sort  shown  in  (12)  can  now  be  cleaned 
up  in  a  way  similar  to  the  program  manipulation  performed 
on  the  second  topological  sort  version  of  Chapter  3  (B) . 
The  code  which  then  results  is : 

(13)        t  :-    nulltuple ; 
/*  prologue  */ 
succ  :=  nullset 
(Vz  G  Dom    sp,  y  G  sp{z}) 

if  Y   G  Dom    succ  then 

succ(y)  :=  succ(y)  +  {z};  else 
succ(y)  =  {z}; 

endif ; 
end    V  ; 

(Vx  G  Dom    sp)  count (x)  :=  [+:  yGsp{x} | yGs] 1 ; 
end    V; 

zrcount  :=  {x  g  s  |  count (x)  =  0}; 
/*  main  loop  */ 
{while   3a  G  zrcount) 

t  :=  t  +  [a]  ; 
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(Vu  s  succ (a) ) 

if   count (u)  =  1  then 

zrcount  :=  zrcount  +  {u}; 
endif; 

count (u)  :=  count (u)  -  1 ; 
end    V; 

zrcount  :=  zrcount  -  {a}; 
end   whi le ; 


This  final  version  of  the  topological  sort  algorithm 
will  run  in  a  number  of  cycles  proportional  to  the  number 
nsp   of  elements  in  the  map  sp.      The  original  form  (1)  of 
the  algorithm  will  require  something  like  nsp    *     (#s)*(#s) 
cycles,  which  can  be  much  larger.   However,  the  symbolic 
chain  of  transformations  going  from  (1)  to  (2)  and  from 
(12)  to  (13)  is  somewhat  tedious  (cf . ,  Chapter  3).  More- 
over, it  does  not  seem  likely  that  these  preparatory  and 
cleanup  transformations  can  be  applied  in  a  completely 
automatic  way. 

Some  of  the  manual   effort  these  transformations 
require    might,  however,  be  alleviated  by  integrating  some 
of   the  transformations  found  in  Appendix  D  (v)  and  (vi) 
as  part  of  the  FD  algorithms.   For  example,  since  an 
existential  quantifier  Q  contains  the  same  variables  which 
would  appear  in  the  set  former  F  implied  by  Q,  the  marking 
algorithm  ISETL  can  certainly  determine  whether  F  is  differ- 
entiable  or  not.   If  it  is,  then  ISETL  can  initiate  trans- 
formation P6  of  Appendix  D (v)   which  will  transform  Q  into 
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a  form  which  exposes  F.   Likewise,  after  each  of  its  steps, 
the  reduction  routine  2SETL  can  attempt  to  apply  all  of 
the  cleanup  transformations  of  Appendix  D  (vi)  within 
the  localized  program  regions  just  changed. 

Another  example  closely  related  to  the  topological 
sort  is  the  transitive  closure  algorithm  found  in  [Schl] . 
This  algorithm  takes  a  set  s  and  a  multivalued  mapping  f 
as  input.   As  output,  it  produces  the  smallest  set  s' 
which  includes  s  and  is  equal  to  the  image  of  f  restricted 
to  s ' .   A  succinct  SETL  version  of  this  algorithm  is  as 
follows : 

(14)  1      {while   3a  e  s  |  s  7^  s  +  f{a}) 

2  s:=s+f{a}; 

3  end   while 

In  order  to  prepare  (14)  for  FD  a  user   might  instruct 
the  system  to  apply  the  following  sequence  of  transformations: 

1.  Turn  the   predicate   s  7^  s  +  f{a}  appearing  in  line  1 
into  a  more  convenient  form   f{a}  -  s  7^  nullset      by  applying 
E4,  D4 ,  N4 ,  and  Nl  of  Appendix  D  (i) . 

2.  P18,  P8,  and  P6  of  Appendix  D  (v)  can  then  be  used  to 
transform  the  while    loop  predicate  in  a  form  suitable  for  FD. 

The  result  of  these  steps  is  the  following  version  of  (14) , 

(15)  1      {while   3  a  G  {x  e  s|  [  +  :  ze{yef{x}  |y^s}]l  7^  0}) 

2  s:=s+f{a}; 

3  end   while; 
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In  (15),  we  note  just  as  in  the  previous  example  that 
RC  =  {f}  and  IV,  =  {s}.    Procedure  ISETL  will  mark 
the  following  expressions  'reducible': 


c  (x)  =  {z  e  f{x}  I  z  ^  s}  , 

c  (x)  =  [+:  y  G  c,(x)]l    ,  and 

c^    =  {x  e  s  I  c^  (x)  7^  0}, 

where   c, (x)  and  c^  differ  (but   only  slightly)  from  their 
counterparts  in  the  topological  sort. 

Then  if  a  user  issues  the  directive, 

(16)  $FD,1, Differ   =    (x   e    s|[+:zG{y   e    f{x}|y   ^    s}]l   ^    0})     , 

2SETL  will  go  through  essentially  the  same  steps  as  for  (3) 
Of  course,  a  slightly  different  derivative  code  sequence 
will  be  obtained  for  c, .   However,  the  reduction  actions 
triggered  by  (16)  will  not  lead  to  the  optimally  efficient 
code.   This  is  because  the  prederivative  of  c^  with  respect 
to  s  :=  s  +  f{x},  i.e., 

(17)  (Vy  G  (f{a}  -  s),  u  e  {z  G  Dom    f  |y  G  f{z}) 

c^  (u)  :=  c^ (u)  -  {y} ; 
end    V  ; 
contains  a  hidden  occurrence,  c,  (a)  =  f{a}  -  s,  of  c,  . 
Unfortunately  this  occurrence  will  go   undecected  by 
2SETL   and  c^  will  be  prematurely  eliminated  as  dead 
following  the  reduction  of  c^.      The  code  which  results 
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will  be  correct,  but  it  will  fall  short  of  the  desired 
efficiency.  Thus,  it  is  better  to  handle  (15)  by  taking 
smaller  and  more  careful  formal  differentiation  steps. 

Another  problem  with  (15)  is  that  we  cannot  profit- 
ably differentiate  the  expression  s  +  f{a}  appearing  in 
line  2  of  (15) .   This  makes  the  assignment  at  line  2 
potentially  inefficient. 

To  make  (15)  more  suitable  for  FD,  a  user  can  apply 
the  transformation  P19   of  Appendix  D  (v)  to  line  2.   The 
code  which  results, 

(18)  (Vx  e  (f{a}  -  s)) 

s  :=  s  +  {x}     /*  strict  addition  */ 
end   V  ; 

can  then  be  transformed  still  further  by  turning  f{a}  -  s 
into  the  canonical  form  {y  g  f{a}|y  ^  s}   which  is  reducible. 
(Transformation  P18  of  Appendix  D  (v)  accomplishes  this 
second  task. ) 

Next,  the  user  can  issue  the  following  FD  directive, 

(19)  $FD,2,Prout  =  {y  e  f{x}  |  y  ^  s}  . 
The  code  which  results  is  shown  as  follows: 
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(20)  /*  Prologue  */ 

/*  SUGG  is  supplied  as  the  auxiliary  map  name 
needed  for  reduotion  of  g1  */ 

1  SUGG  :=  nullset; 

2  (Vz  G  Dom    f,  y  e  f{z}  |  z  s  Uom    f) 

3  if   y  e  Dom    SUGG  then 

4  suGc(y)  :=  sugg (y)  +  fz};  else 

5  SUGG (y)  : =  { z} ; 

6  endif    ; 

7  en^f  V; 

/*  Prout  Gorresponds  to  g1  */ 

8  (Vx  e  Dom    f) 

9  Prout(x)  :=  {z  e  f{x}  |  z  ^  s); 

10  end    V ; 

/*  main  loop  */ 

11  {while   3aG{x€s|[+:  zG  Prout  (x)  ]1  ^  0}) 

12  (VxGProut(a) ) 

13  (Vz  G  {x},  u  e  sugg(z) ) 

14  Prout(u)  :=  Prout{u}  -  {z}; 

15  end   V;  /*  prederivative  of  prout  */ 

16  s  : =  s  +  { X } ; 

17  end   V ; 

18  end   while; 

At  this  point  one  additional  user  command, 

(21)  $FD, 11, Differ  =  {x  G  s|[+:  z  g  Prout(x)]l  j^    0} 
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will  carry  (20)  into  a  penultimate  version  of  (14). 
An  efficient  final  form  is  subsequently  reached  by  apply- 
ing standard  cleanup  transformations. 

It  is  useful  to  follow  the  FD  actions  triggered  by 
(21) .   Analysis  of  the  main  while    loop  (line  11)  of  (20) 
will  show  that  set  s  belongs  in  IV^  and  the  map  Prout  is  a 
member  of  IV  .   ISETL  will  discover  two  reducible  expres- 
sions,  c^ (x)  =  [+:  z  e  Prout (x)]l   matching  form  10  of  F, 
and  c^  =  {x  e  s|c„(x)  f^  0}  which  is  of  the  first  elementary 
form  with  c„  corresponding  to  the  pattern  F9 .    2SETL  will 
first  attempt  to  differentiate  the  inner  expression  c~ 
before  it  reduces  c...   Based  on  entry  10b  of  the  D  table, 
the  system  obtains  the  following  prederivative  code  for  c_ 
with  respect  to  the  change  to  Prout  at  line  14  of  (20)  , 

(22)  C2(u)  :=  C2(u)  -  [+:  w  e  1  z }  ]  1  ; 

Note  that,  owing  to  the  strict  element  deletion  occurring 
at  line  14,  (22)  will  be  generated  by  a  special  derivative 
code  entry.   The  initialization  code  for  c„  ,  obtained 
from  entry  10a  of  Init,  is 

(23)  (Vy  e  Dom   Prout) 

C2(y)  :=  [+:  X  G  Prout  (y)]l; 
end    V  ; 

which  is  placed  at  the  end  of  the  prlogue .   Finally,  c^  will 
be  reduced  as  prescribed  by  entry  la  and  Ijj  of  the  D  table. 
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The  prederivative  code  for  c^  with  respect  to  the  change 
(22)  is 


(24)        i/  u  e  s  &  c„  (u)  =  [+:  w  e  {z}]l  then 


C3  :=  C3  -  {u}; 


endif; 


The  prederivative  code   to  update  c   relative  to  the 
strict  element  addition  to  s  is 


(25)        (Vy  e  {x}  I  c^(y)    ^  0) 


c  3  :  =  c  3  +  {  y  }  ; 


end   V; 


Since  (24)  and  (25)  contains  uses  of  c-  ,  c„  will  remain 
as  a  reduced  subexpression   of  c^.  The  system  will  request 
the  user  to  supply  a  name  for  c„  and  will  also  initialize 
c   at  the  end  of  the  prologue.   In  response  to  a  user 
request,  $UNPARSE,  at  this  point  the  system  would  print 
out  the  following  more  efficient  version  of  (20), 


(26)  /*  prologue  */ 

1  succ  :=  null  set; 

2  (Vz  s  Dom    f,  y  e  f{z}|z  e  Dom    f) 

3  if  Y   ^    Dom   succ  then 

4  succ(y)  :=  succ(y)  4  {z};  else 

5  succ  (yiji  :=  {  z}  ; 

6  endif; 

7  end   V; 

/*  Prout  corresponds  to  c,  */ 
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8  (Vx  e  Dom    f) 

9  Prout{x)  :=  {z  e  f{x}|z  ^  s}; 

10  end  V; 

/*  count  is  the  user  name  for  c„  */ 

11  (Vx  G  Dom    Prout) 

12  count (x)  :=  [+:  z  e  Prout (x) ]1; 

13  end    V; 

14  Differ    :=    {x  g    s|count(x)    7^    0}; 
/*    main    loop    */ 

15  (while  3  a   G    Differ) 

16  (Vx  G    Prout (a) ) 

17  (Vz    G    {x},    u  e    succ(z)) 

18  £/  u  G  s  &  count(u)  =  [+:  w  G  {z}]lthen 

19  Differ  :=  Differ  -  {u}; 

20  endif; 

21  count(u)  :=  count(u)-[+:  w  G  {z}]l; 

22  Prout(u)  :=  Prout(u)  -  { z } ; 

23  end    V; 

24  (Vy  G  {x}  I  count(y)  ^  0) 

25  Differ  :=  Differ  +  {y}; 

26  end  V; 

27  s:=s+{x}; 
2  8          end  V ; 

29     end   while', 


All 


To  clean  up  the  above  code,  one  can  apply  the 
following  transformations   all  of  which  are  described 
in  Appendix  D:   c_  of  Section  vi   will  simplify 
[+:  w  G  {z}]l   occurring  at  lines  18  and  21  to  1 ; 
c„  of  the  same  section  will  turn  the  loop  at  line  24  into 
a  conditional  statement. 


if   count  (x)  ^    0  then 

Differ  :=  Differ  +  {x}; 
endif; 

A  similar  transformation  applies  to  loop  17.   Finally,  the 
redundant  element  test  z  g  Bom   f  at  line  2  can  be  elimi- 
nated by  a  variant  of  transformation  s„  of  Section  iii. 
As  a  third  example,  we  consider  an  algorithm  which 
finds  an  interval  in  a  flow  graph.   Input  to  this  algorithm 
consists  of  a  set  V  of  nodes,  an  edge  collection  E  repre- 
sented as  a  mapping,  and  a  root  node  H.    The  algorithm 
will  output  a  set  Int  of  nodes  forming   the  interval  in  V 
whose  header  node  is  H.   A  base  form  SETL  version  of  this 
algorithm  can  be  written  in  the  following  way: 

(27)        Int  :=  {h}; 

{while   3  a  e  (E[Int]  -  Int)  |  ({xe  (v-Int)  |  aGE  (x)  } 
=  null  set) ) 
Int  :=  Int  +  {a} ; 
end  while 
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But  in  order  to  put  (27)  into  a  form  suitable  for 
FD,  we  change  it  by  straightforward   interactive 
program  manipulation  into  the  following  equivalent  form: 

(28)  1      Int  =  {H}; 

2  {while   3a  G  {x  e  [+:  w  e  lnt]E(w)  |x  ^  Int 

&  [+:  z  e  {y  G  V|y  ^  Int  &  x  G  E(y)}]l  =  0}) 

3  Int  :=  Int  +  {a} ; 

4  end   w  hile  ; 

Though  seemingly  more  complicated  than  the  previous  two  case 
studies,  (28)  can  actually  be  differentiated  using  only  one 
user  directive , 

(29)  $FD,  2,  New  =  {xG[+:  w  G  Int]E(w) |x  ^  Int 

&  [+:  z  e  {y  G  v|y  ^  Int  &  x  G  E(y)}]l  =  0}. 

To  process  (29) ,  the  system  will  note  that 
IV,  =  {Int}  and  RC  =  {E,V},   ISETL  will  discover  the 
following  four  reducible  expressions: 

c  (x)  =  {y  G  V  I  y  ^  Int  &  x  G  E(y)} 

c^ (x)  =  [+:  z  G  c^ (x) ]1 


and 


c-     =  [+:  w  G  Int]  E(w)  , 


=  {x  G  c  Ix  ^  Int  &  c  (x)  =  0}, 


each  of  them  matching  a  different  elementary  form  found 
in  F. 
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The  reduction  procedure,  2SETL,  will  reduce  the 
innermost  expressions  c,  and  c   first.   The  prederivatives 
of  c^  and  c   relative  to  the  modification  of  Int  at 
line  3  of  (28)  are 


(30)  (Vy  G  ({a}  -  Int),  x  G  E (y)  |  y  G  v) 

c^ (x)  :=  c, (x)  -  {y} ; 
end    V  ; 

(obtained  from  entry  le  of  D) ,  and 

(31)  c^  :=  0^+  ([+:  w  G  ({a}-  Int)]E(w)  -  c  ) ; 

(30)  and  (31)  will  be  inserted  (in  an  arbitrary  order) 
just  prior  to  line  3.  Within  the  prologue  to  L,  2SETL 
will  insert  the  following  code. 


(31')       /*  initialize   c   */ 
c,  :=  null  set ; 
(Vy  G  V,  X  G  E(y)  |y  ^  Int) 
if  X  &   Dom   c,  then 

c^  (x)  :=  c^ (x)  +  {y} ;  else 
c^(x)  :=  {y}; 
end-if; 
end   V  ; 

/*  initialize  c^  */ 
c   :=  [+:  w  G  Int]E(w) ; 
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which  makes  c,  and  c.,  available  on  entrance  to  the  while 
loop  L . 

Next  2SETL  will  reduce  c„.   The  prederivative   of 
c„  with  respect  to  the  assignment   c, (x)  :=  c, (x)  -  {y} 
within  (30)  is  given  by  the  following  code. 


(32)  C2(x)  :=  C2{x)  -  [+:  w  e  {y}]l; 


Since  c„  does  not  depend  on  c,  ,  c,  can  be  eliminated  from 
L  and  from  the  prologue  P  to  L.  To  do  this,  2SETL  first 
inserts  initializing  code  for  c^.   Because  c~  depends  on 
c,  ,  and  because   c,  is  defined  incrementally  within  P, 
Cy   will  be  initialized  by  means  of  incremental  entries 
10b   and  10c   of  Init.    Once  this  has  been  accomplished 
c,  can  be  removed  from   P  and  L. 

2SETL  is  now  ready  to  reduce  c..   However,  before 
we  describe  the  reduction  steps  used  for  this  purpose, 
it  is  useful  to  pause  and  see  what  has  already  taken  place. 
The  state  of  the  interval  program  after  the  transformations 
already  noted   is  as  follows: 
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(33)  1  Int  :=  (h); 

/*  prologue  */ 

2  c^  :=  nullset;  /*  define  c^    */ 

3  (Vy  e  V,  X  e  E(y) |y  ^  Int) 

4  if  X   ^    Dom    c„  then 

5  ^2^^^  ■"  ^2^^^  "*"  ["•"=  ^  ^  {y}]l?  else 

6  ^2^^^  ■"  ^'^'    ^   ^    ^^^^  ■^' 

7  endif; 

8  end    'i ; 


c 


3  • 


[+:  w  e  lnt]E(w);     /*  define  c^  */ 


/*  main  loop  */ 

10  (while   3a  6  {x  G  c^  I  x  ^  Int  &  C2(x)  =  O}) 

11  (Vy  G  ({a}  -  Int),  x  6  E(y)  |  y  G  v) 

12  ^^2^^^  ■"  ^2^^^  "  t'^^  "  ^  {y}]i; 

13  end    V; 

14  c^  :=  c^  +  ([+:  w  G  ({a}  -  Int]E(w)  -  c^) ; 

15  Int  :=  Int  +  {a}; 

16  end   while; 


But  to  continue:   c.  matches  elementary  form  1  in  F, 
and  it  depends  on  parameters   c-,  ,  Int,  and  c-  correspond- 
ing to  pattern  variables   x,  ,  x^  ,  and  F8   of  form  1. 
The  modifications  of  c^  ,  Int  ,  and  c„  at  lines  14,  12,  and 
15   of  (33)   lead  to  derivative   code  entries   la,  Ihh, 
and  le   respectively.   These  prederivative  sequences  are 
as  follows, 
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(34)  /*  for  the  change  to  c-  */ 

(Vz  e  ([+:  w  G  ({a}-Int]E(w)-c  ) I z  f    Int  &  C2(z)=0) 


^4  :=  c^  +  {z} ; 


end    V ; 


(34')  /*  for  C2  */ 


f/  X  G  C-,  &  ^2^^^  ^  ^'''*  wG{y}]l  &  x^Int  then 
C4  :=  c^  +  {x}; 


endif', 

(34")         /*  for  Int  */ 

(Vy  G  ({a}  -  Int)  |  y  G  c^  &  c^{^i)    =  0) 

^4  ==  ^4  -  ^y^' 
ewd  V ; 

Since  uses  of  c„  and  c^  occur  within  (34),  (34'),  and  (34"), 
these  values  cannot  be  removed.   The  system  will  therefore 
request  the  user  to  supply  names  for  c„  and  c^.  Finally, 
initialization  code  for  c.  will  be  inserted  at  the  end 
of  the  prologue. 

The  FD  command  (29)  will  therefore  initiate  system 
actions  taking  the  high  level  algorithm  (28)  into  the 
following  equivalent  concrete   version. 
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(35)  1        Int  :=  {H}; 

/*  prologue  */ 

2  predout  :=  nullset;       /*  corresponds  to  c-  */ 

3  (Vy  e  V,  X  6  E(y)  |  y  ^  Int) 

4  if  X   G    Dom   predout  then 

5  predout(x)  :=  predout  (x) +  [  + :  w  G  {y}]l;eZ.se 

6  predout (x)  :=  [+:  we  {y}]l; 

7  endif; 

8  end    V; 

9  succ  :=  [+:  w  G  Int]E(w);   /*  same  as  c^  */ 

10  new  :=  {x  e  succ | x  ^  Int  &  predout (x)  =  0}; 
/*  main  loop  */ 

11  Awhile  3  a  G  new) 

12  (Vy  G  ({a}  -  Int),  x  G  E(y)  |  y  G  V) 

13  if  X   G    succ  &  predout(x)  =  [+:  w  G  {y}]l 

&  X  ^  Int  then 

14  new  :=  new  +  {x}; 

15  endif; 

16  predout(x)  :=  predout (x) -[+ :  w  G  {y}]l; 

17  end    V; 

18  (Vz  ( [+:wG({a}-Int]E(w)-Succ) 

z  ^  Int  &  predout(z)  =  0) 

19  new  :=  new  +  {z}; 

20  end    V; 

21  succ  :=  succ+([+:  wG ( {a }-Int] E (w) -suCc) ; 

22  (Vy  G  ({a}-Int)  y  g  succ  &  predout (y) =0) 

23  new  :=   new  -  {yl; 

24  end  V; 

25  Int  :=  Int  +  (a); 
end   while; 

Although  (35)  invites  cleanup   at  almost  every  other 

line,  (35)  itself  represents  a  considerable  speedup  over 

(27);  i.e.,  we  can  expect  (35)  to  run  at  worst  in  time 

proportional  to  #E. 
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As  a  last  example  of  algorithmic  improvement  by 
formal  differentiation,  we  consider  Haberman ' s  Banker's 
Algorithm  for  detecting  deadlock  amongst  concurrent 
processes  competing  for  resources  (e.g.,  in  an  operating 
system  environment) .   This  algorithm  models  resource 
allocation  by  allocating  'loans'  to  customers  who  make 
known  demands.   Initially  each  customer  c  will  have  a 
quantity,  loan  (c) ,  already  preallocated  to  him,  but  he 
also  requests  claim  (c)  more  money.   Once  his  total  demand 
is  met,  he  will  repay  the  bank  his  entire  borrowed  amount 
within  a  finite  amount  of  time. 

The  bank  starts  out  with  an  unallocated  sum,  cash, 
which  it  must  use  to  satisfy  the  full  demands  of  all  of 
the  customers   one  after  the  other. 

The  version  of  Haberman 's  algorithm  given  here   uses 
only  one  kind  of  currency  (equivalent  to  one  resource  type) . 
Its   strategy  is  to  meet  the  demands  of  any  customer  c  whose 
claim  is  less  than  the  bank's  available  cash.   The  bank  will 
then  wait  until  c  makes  full  repayment  and  is  no  longer  a 
customer  before  scheduling  any  more  customers.   If  all 
customers  have  been  eliminated  when  the  algorithm  terminates, 
the  original   configuration  of  loans  is  safe       (i.e.,  a 
deadlock  can  be  avoided) ;  otherwise  not. 

A  base  form  SETL  version  of  this  algorithm  is 


485 


(36)  /*  cus  is  the  set  of  customers  */ 

1  {while   3c  s  cus  |  claim{c)  <_  cash) 

2  cash  :=  cash  +  loan(c); 

3  cus  :=  *cus  -  {c}; 

4  end   while; 

To  reduce  (36)  only  one   preparatory  transformation  is 
required.   Specifically,  P6  of  Appendix  D  (v)  should  be 
applied  to  the  existential  quantifier  within  the  while 
loop.  The    while      loop  predicate  will  then  appear  as 

(37)  3  c  e  {x  e  cus  |  claim(x)  <_   cash} 

which  is  ready   for  reduction.   A  user  can  now  issue  the 
directive 

(38)  $FD,1,  Gcus  ==  {x  G  cus|claim(x)  <_  cash} 

to  begin  a  rapid  and  dramatic  transformation  of  (36) . 

Since  IV    =  {cus},   IV^=  {cash},  and  RC=  {claim,  loan}, 
Gcus  is  reducible  and  matches  elementary  form  1  of  F  with 
variables  cus,  claim  and  cash  matching  the  corresponding 
parameters   x^  ,  F  5   ,  and  x    of  form  1.   Of  these 
parameters,  only  x,  and  x, ,  undergo  change;  cus  is  defined 
at  line  3,  cash  at  line  2,  and  these  changes  match  parameter 
changes  in  D  corresponding  to  derivative  entries  lb  and  laa. 
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These  entries  lead  to  the  following  prederivative   code, 

/*  update  code  due  to  change  in  cus  */ 

(39)  (Vx  G  ({c}  *  cus)  |claim(x)  <_  cash) 

Gcus  :=  Gcus  -  {x}; 
end    V  ; 
and 

(40)  /*  due  to  change  in  cash  */ 
{while    xmin, ,  <  cash  +  loan(c)) 

(Vx  e  {u  s  cus|claim(u)  =  xmin,.,}) 

Gcus  :=  Gcus  +   x  ; 
end    V  ; 

xmin^ ,  :=  succ, , (xmin, , ) ; 
endwhi 1e ; 

The  variables  xmin^ ^  and  succ, ,   appearing  in   (40)   must 
JDe  initialized  on  entry  to  the  loop.   The  initializing 
code  for  Gcus   stems  from  entry  1   of  Init  and  is  given  by 
the  following  code: 

(41)  sortas (claim[cus] ,  11); 

xmin, ,  :=  \min:    w  G  claim[cus] |w  >  cash]w; 
Gcus  :=  {x  6  cus|claim(c)  <_  cash}; 

where   sortas  sorts  claim [cus]   in  ascending  order  and 
produces  successor  and  predecessor  maps,  succ,  ,  and  pred,., 
for  traversing  this  sorted  set  of  numbers. 

For  FD  to  be  profitable  the  derivative  table  entry  Ihh 
requires  that  the  expression  c,  (xmin, ,)={  uGcus | claim (u) =xmin,,} 
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must  be  reducible.   Hence,  our  FD  system  will  define  c, 
by  inserting  the  following  code, 


(42)  c,  :=   nullset; 


(Vx  G  cus) 

if   claim(x)  ^  Dom    c,  then 

c, (claim (x) )  :=  c, (claim(x))  +  {x}; 

else    C-,  (claim  (x)  )  :=  {x}; 

endif; 
end    V; 

at  the  end  of  the  prologue.    Just  before  line  3  of  (35), 
the  prederivative  code 

(43)  (Vx  e  {c}  *  cus) 

c,  (claim(x))  :=  c,  (claim(x))  -  {c}; 
end    V; 

obtained  from  entry  lb  of  D   will  be  inserted.   Finally, 
c, (xmin, , )   will  replace  the  calculation  it  represents 
within  (40)  . 

The  version  of  the  Banker's  Algorithm  which  results 
from  all  this  runs  an  order  of  magnitude  faster  than  (36), 
and  is  as  follows: 
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(4  4)  /*  prologue  */ 

1  sortas (claim [cus] ,  11); 

2  xmin,,  :=  [min:    w  e  claim[cus]  |w  >  cash]  .'; 

3  Gcus  :=  {x  G  cus  I  claim  (c)  <_  cash}; 

4  goodc  :=   null  set',  /*      goodc  =  c,  */ 

5  (Vx  G  cus) 

6  if   claim (x)  g  Dom    goodc  then 

7  goodc (claim (x) )  :=  goodc (claim (x) )+{ x} ; 

8  else   goodc (claim(x) )  :=  {x}; 

9  endif; 

10  end   V; 

/*  main  loop  */ 

11  (while  3c  e  Gcus) 

12  ( 'J^^^s  xmin^^  <  cash  +  loan(c)) 

13  (Vx  e  goodc (xmin, , ) 

14  Gcus  :=  Gcus  +  {x}; 

15  end    V; 

16  xmin, ,  :=  succ, , (xmin, , ) ; 

17  endwhile'i 

18  cash  :=  cash  +  loan(c); 

19  (Vx  G  ({c}  *  cus)  |claim(x)  <_   cash) 

20  Gcus  :=  Gcus  -  {x}; 

21  end    'i ; 

22  (Vx  G  ({c}  *  cus)  ) 

23  goodc  (claim(x)  )  :=  goodc  (claim  (>.)  )  {x}; 

24  end    V; 

25  cus  :=  cus  -  {c}; 
2  6  endwhile;        439 


The  above  code  requires  some  cleanup,  but  its  asymptotic 
speed  is  still  good  —  at  worst  proportional  to  #cus  log  #cus , 
Note  that  the  sort  operation  at  line  1  may  require  this 
much  time;  however,  when  #claim[cus]  <<  #cus,  the  expected 
running  time  of  (4  4)  will  be  0(#cus) , 
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E.    Conclusion 

1.    Implementation  Plans 

The  two  main  goals  proposed  in  this  thesis  for  future 
work  are  the  implementation  of  an  interactive  source  to 
source  Subsetl  program  manipulation  system  and  the  mechani- 
zation of  formal  differentiation  for  Subsetl.   Since  SETL 
provides  machine  portability,  an  interactive  capability, 
readability,  and  a  minimal  programming  effort  we  will 
use  SETL  to  implement  these  projects. 

The  actual  work  can  be  completed  incrementally  in 
six  phases.   In  the  first  three  phases  v/e  plan  to  implement 
the  basic  source  to  source  transformational  system  which 
also  supports   the  FD  design  of  Chapter  4  (C) .   This 
initial  system  will  then  be  extended  in  the  next  three 
phases  to  make  it  useful  for  experimenting  with  and 
automating  formal  differentiation.   Ultimately  we  hope 
to  incorporate  FD  as  part  of  an  optimizing  compiler. 

A  brief  description  of  each  phase  of  our  proposed 
implementation  project  is  as  follows: 
Phase  1. 

First  we  plan  to  construct  a  basic  interactive  source 
to  source  Subsetl  program  improvement  system  as  described 
in  Chapter  3  (A,B).   For  this,   we  require  the  unparser, 
pattern  matcher,  and  macro  expander  given  in  Appendix  E 
(i-iii)  and  a  transformational  library  containing  several 
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of  the  set  theoretic  transformations  found  in  Appendix  D. 
In  addition  we  must  supply  a  Subsetl  parser   and  a  command 
processor  which  interfaces  the  transformational  system 
and  user. 
Phase  2. 

Next,  we  want  to  accomplish  control  flow,  data  flow, 
and  type  analysis.   And  once  this  is   done,  we  can  proceed 
to  implement  semiautomatic  FD  for  completely  continuous 
Subsetl  expressions  (cf..  Chapter  3(C)). 
Phase  3. 

In  this  last  preliminary  phase,  the  simple  pattern 
matching  and  macro  expansion  routines  introduced  in  phase  1 
should  be  replaced  by  the  more  powerful  pattern  handling 
procedures  shown  in  Appendix  E  (v) .    Then  we  can  implement 
the  more  ambitious   semiautomatic  FD  implementation  of 
Chapter  4   for  general  Subsetl  expressions. 
Phase  4. 

Starting  with  the  semiautomatic  implementation  completed 
in  phase  3,  we  can  now  proceed  to  study  program  derivations 
which  depend  on  formal  differentiation   so  that  we  might  gain 
sufficient  empirical  evidence  to  make  FD  fully  automatic. 
To  attain  this  goal,  we  must  be  able  to  mechanize  three 
main  tasks: 

a.   Reducible  expressions   must  be  recognized  before 
preparatory  transformations  to  produce  them  are  even 
applied  (cf . ,  p.  470)  for  some  initial  ideas  on  how  this 

492 


can  be  done) . 

b.  Some  of  the  reducible  expressions  recognized  in 
the  preceding  step  can  then  be  selected  for  reduction. 
This  will  trigger  application  of  a  chain  of  preparatory 
transformations  (using  Kibler's  chaining  technique;  cf.. 
Appendix  D  (x) ) . 

c.  Preparatory  transformations  will  lead  seguet 

into  successive  reduction  steps  interleaved  by  applications 
of  chains  of  cleanup  transformations  (once  again,  using 
Kibler's  chaining  technique). 

There  is  little  doubt  that  an  effective  mechanization 
of  set  theoretic  FD  will  require  declarations  to  supply 
important  program  facts  which  cannot  be  determined  by 
automatic  program  analysis.   For  example,  profitable 
differentiation  of  c„  =  {x  e  s|q  G  f (x) }  (cf.,  (29)  of 
Chapter  II  (D) )   depends  on  the  property 

(1)         #  f  (x)  <<  #s    for   each   x  e  s  , 

which  would  have  to  be  declared  in  any  practical  implementa- 
tion.  Of  course,  when  f  is  the  successor  or  predecessor 
map  defined  on  s,  then  any  declaration  stating   that  (f,s) 
represents  a  tree,  a  dag,  or  a  flow  graph   would  lead  to 
the  expectation  that  condition  (1)  holds.   Note  that 
differentiation  of  the  topological  sort  and  transitive 
closure  algorithms  in  the  last  section  requires  condition  (1) 
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We  encounter  another  example  of  a  program  property  which 
must  be  declared  when  choosing  between  two  competing  rules 
for  reducing 

(2)  c-|_  =  {x  e  s  I  f  (x)  <  q}  . 

To  differentiate  (2)  relative  to  incremental  changes  in  q, 
one  reduction  method  is  preferred  when  the  image  set  of 
integers  f[s]  is  'dense',  while  another  method  works  much 
better  when  f[s]  is  'sparse'  (cf.,  (27)  of  Chapter  II  (D) ) . 
The  latter  method  is  used  to  differentiate  the  Bankers' 
Algorithms  in  Chapter  4  (D)  and  Appendix  F. 

Once  we  work  out  declarations  for  specifying  such 
properties  as  sparsity,  and  once  these  declarations  are 
sufficient  to  enable  a  fully  automatic  implementation  of  FD , 
then  we  will  study  how  to  incorporate  automatic  FD  as  part 
of  the  SETL  compiler.   FD  is  a  higher  level  optimization 
and  naturally  precedes  the  techniques  of  automatic  data 
structure  selection  which  have  recently  been  added  to  the 
SETL  optimizer. 

Note  finally,  that  the  success  of  installing  FD  as 
part  of  the  SETL  compiler  may  ultimately  rest  on  the 
efficiency  of  the  implementation.   Thus,  it  may  be  necessary 
to  improve  our  pattern  handling  techniques  and  to  explore 
inexpensive  ways  to  perform  data  flow  and  type  analysis 
incrementally. 
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Phase  5. 

Assuming  that  FD  has  been  successfully  automated  in 
phase  4 ,  we  can  proceed  to  expand  the  interactive  system 
so  that  it  can  manipulate  source  programs  written  in  full 
standard  SETL .   Next  we  will  enlarge  our  initial  collec- 
tion of  set  theoretic  FD  transformations,  and  will  explore 
new  techniques  for  reducing  a  more  general  class  of 
expressions. 

Among  those  FD  transformations  which  seem  like  promising 
candidates  for  inclusion  into  our  system  are  the  various 
incarnations  of  Rule  2.   We  can  also  admit  FD  table  entries 
for  two  new  reducible  expression  forms  roughly  corresponding 
to  the  set  formers 

(3)  {x  G  Fj^  (q^,  .  .  .  ,q^)  |Kj^  (x,  t^,  .  .  .  ,t^)  ov    ~\Y.^  (x  ,b^  ,  .  .  .  ,b^)  } 
based  on  (49)  of   Chapter  II  (D)  and 

(4)  {K^(x)  :  X  e  F^Ct-j^,  ..  .  ,t^)  } 

related  to  (56)  of  Chapter  II  (D) .   Note  that  within  (3) 

and  (4)  the  parameters   q.  ,  i  =  l,...,n,   t.  ,  i  =  l,...,m, 

and  b.  ,  i  -    1, ...,£,  are  discontinuities  in  which 

n,m,2.  ^0;   F   must  be  a  region  constant;  F-  must  belong 

to  the  induction  set  IV  (cf . ,  p.  456);  K,  is  restricted 

in  the  same  way  as  the  parameter  K  appearing  in  Forml   of 

the   F  table  in  Appendix  C  (iii);  K   is  of  the  same  form 

as  K^  but   every  free  variable  occurring  within  K^  but 

outside  of  any   b.,  i  =  !,...,£    must  be  a  region  constant; 
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K  must  be  constrained  by  the  condition  Restrict (K, f) 
(cf.,  p.  427).   We  can  further  broaden  the  class  of 
reducible  expressioiis  and  useful  FD  transformations  by 
adding  appropriate  FD   entries  for  handling  several  of 
the  examples  discussed  in  Appendix  F. 

It  may  be  somewhat  more  difficult  but  still  worthwhile 
to  study  reduction  techniques  for  handling  discontinuous 
expressions  some  of  whose  discontinuity  parameters  require 
range  analysis.   An  example  of  such  an  expression  is 

(5)  c^(y)  =  {z  G  y  I  z  ^  s} 

which  must  be  reduced  in  order  to  improve  the  grammar 
algorithm  (6)  of  Appendix  F.   To  reduce  (5)  we  determine 
the  range  of  values  D   of  the  discontinuity  y  by  interrogat- 
ing  the  Usetodef  map.   However,  in  more  complicated 
situations  range  analysis  may  require  comprehensive  value 
flow  (Sch8).    In  still  more  difficult  situations  it  may 
be  easier  to  fall  back  on  range  declarations. 

We   can  also  handle  general  discontinuities  (which  may 
not  be  removable)   using  the  'memo'  function  technique 
discussed  in  Chapters  1  and  2.   Using  this  method,  we  are 
able  to  reduce  all  applicative  expressions,  but  accurate 
speedup  estimates  are  not  generally  possible  since  they 
are  likely  to  depend  on  undecidable  facts;  i.e.,  improvement 
in  running  time  will  depend  strongly  on  a  large  loop 
iteration  and  on  small  ranges  of  values  for  the  discontinuities, 
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As  is  noted  in  Appendix  F  (cf.,  (28-30),  (40)),  it  can 
be  useful  to  develop  general  techniques  to  eliminate 
redundancy  occurring  among  the  discontinuities  of  an 
expression.   For  example,  straightforward  prederivative 
code  for  the  set former 

(6)  c^(q^,q2,q^)  =   {x^s | q^SK^ (x)  &  q2eK2 (x)  &  q^GK^ (x) } 
relative  to  a  change   s  :=  s  +  A   is 

(7)  (Vx  G  (A  -  s),  t^  e  K^(x),  t2  G  K2(x),  t3  G  K^  (x)  ) 

If   c^(t^,t2,t2)  7^  Q.    then 
c,  {t,,t^,t-.)    with   X  else 

'^1^^1'^2'^3^  :=  {x}; 
endif; 
end    V ; 

But  by  exploiting  the  redundant   use  of  q,  within  (6),  we 
can  rewrite  (7)  using  the  following  more  efficient  code: 

(8)  (Vx  G  (A  -  s),  t^  G  K^(x),  t2  G  K2(x)  1 1^  G  K^U)) 

If   c^(t^,t2,t-^)  yi   Q.    then 
c, (t,,t2,t,)  with    X  else 

^1^^1'^2'^1^  :=  {x}; 
endif; 
end    V ; 

In  fact  we  can   reform  c,  as  a  bivariate  map  by  eliminating 
one  of  the  q,  parameters. 

The  preceding  example  leads  to  a  general  rule  which 
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sometimes  may  be  applied  to  handle  redundant  discontinuities. 
Consider  an  expression 

which  depends  continuously  on  modifications  in  x.  ,  i=l,...K 

and  discontinuously  on  the  remaining  parameters.   Suppose 

that  the  prederivative  code  for  (9)  which  compensates 

for  a  change   x.  :=  A    ,  i  £  k   is  of  the  following  form: 

i 

(10)  (V[tj^_^^,  .  .  .  ,t^]  G  Project(n-k,c)  ) 

c(tj^^^,...,t^)  :=  '^c(tj^^^,...,t^)' 
end    V ; 

In  order  to  handle  the  following  new  expression 

(11)  c^  (x)  =  f  (x^,  .  .  .  ,Xj^,X,X,  .  .  .  ,x) 

formed   from  (9)  by  replacing  each  discontinuity  parameter 

of  (9)  by  X,  we  note  first  of  all  that  the  map  c,  which 

stores  separate  calculations  of  (10)  only  uses  a  single 

parameter  corresponding  to  x.   We  also  observe  that  the 

prederivative  code  for  (11)  relative  to  a  change 

X.  :=  A    ,  i  <  k,   may  be  derived  rather  easily  from  (10); 

1 
i.e.,  this  update  code  is 

(12)   (Vt  e  Dom   c^\   [t,t,...,t]  e  Project  (n-k+l,Cj_{t} )  ) 

^1^^^  ==  ^c^(t)' 
encfV ; 
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I 

where  the  expression  6   .    appearing  in  (12)  can  be 
constructed  from  the  expression  6  ,.         .     .    appearing 
in  (10)  using  the  following  substitution  steps: 

1.  First,  we  replace  occurrences  of  c(t,  ,,..., t  ) 

Kt"  J.       n 

within  6  ,  .by  occurrences  of  c,  (t) . 

^^^k+1'---'  n^  i 

2.  Then  we  replace  all  remaining  occurrences  of 

t,  ^,,...,t   by  occurrences  of  t. 
k+1      n   -^ 

Techniques  which  eliminate  redundant  discontinuities 
not  only  help  to  improve  reduction  of  dif ferentiable 
expressions,  but  also  allow  us  to  reduce  expressions  not 
normally  suited  for  FD .   For  example,  although  the  setformer 
c^  (q^^,  .  .  .  ,q^)  =  |x  6  s  \    or      K^  (x)  =  qA 
is  not  reducible,  the  related  setformer 

n 

(13)  c  (q)  =  {x  e  s  I  or   K. (x)  =  q} 

i=l   ^ 

can  be  reduced.   To  see  this,  note  that  (13)  can  be  transformed 
into  the  following  equivalent  form, 

(14)  C2(q)  =  {xGs  I  [  +  :  ye  {l<i<n|K^(x)  =q}]l7^0} 

The  innermost  subexpression  {l  <_   i  £  n|K.  (x)  =  q}  within 
(14)  may  be  reduced  in  a  manner  similar  to  (38)  of  Chapter  2  (D) 
It  is  clear  how  to  reduce  the  other  subexpressions. 
Phase  6 . 

In  this  final  experimental  phase,  we  plan  to  uncover 
and  then  implement  useful  very  high  level  dictions  optimiz- 
able  by  FD.   One  example  is  Schwartz's  'Pursue  Block' 
(Schll,  Schl2), 
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vv 

(15)  Block 
end    VV; 

which  causes  Block    to  be  executed  repeatedly  until  such 
execution  results  in  no  change  in  value  to  any  variable. 
In  addition  to  the  form  (15)   Schwartz  also  allows   pursue 
blocks  to  involve  bound  variables,  e.g., 

(VVx  Gs,  ,  .  .  .  ,x  Gs  (x  ,  .  .  .  ,x  )  |K(x,  ,  .  .  .  ,x  )  ) 
11      nni      n'l      n 

(16)  Block  (x  ,  .  .  .  ,x  ) 

end    VV; 

which  does  the  same  thing  as 

(17)  [while   3x,es,     ,...,    x  es     (x,,...,x    )|K(x,  ,...,x    )    & 

11  nnl  n'l  n 

execution   of  Block   would  result    in   a   change    of  state) 

Block  (x, , . . . ,x  ) 
1      n 

end   while  ; 

but  much  more  concisely. 

Certainly,  the  inefficiency  of  the  pursue  block  (16) 
can  be  a  high  price  to  pay  for  clarity.   However,  for  situa- 
tions when  formal  differentiation  can  improve  (17),  this 
cost  ;.;a_^  be  avoi'^ed.   NoLe  in  particular  that  the  base  form 
transitive  closure  algorithm  of  Chapter  4  (D)  (cf . ,  (14)) 
can  be  written  more  simply  as 
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(18)  (VVa  G  s) 

s  :=  s  +  f{a}; 
end    VV; 

and  the  main  loop  of  the  base  form  available  expressions 
algorithm  found  in  Appendix  F  (of.,  (36))  can  be  crisply 
stated  using  the  following  pursue  block, 

(19)  (VVn  G  Np) 

AE(n)  :=  [*:  y  G  pred  (n)  ]  (  (AE  (y)  *PR  (y )  ) +XE  (y )  )  ; 
end    VV; 

The  preceding  examples,  (18)  and  (19)  represent  two  of 
the  four  main  cases  in  which  Pursue  Blocks  of  the  form 

(20)  (VVx  G  s|K(x)) 

statement 
end   VV; 

may  be  reduced.   These  four  cases  correspond  to  the  following 
four  'monotonic'   forms  for  statement'. 

(21)  A  :=  A  +  exp (x) 

A(x)  :=  A(x)  +    exp(x) 

where  exp(x)  is  any  expression  involving  x. 

To  dllustrate  one  of  these  cases,  consider  the  Pursue 
Block 

(22)  (VVx  G  s|K(x)  ) 

A(x)  :=  A(x)  +  exp(x); 
end   VV; 
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which  can  be  implemented  by  the  following  lower  level  code, 

(23)   (whileixe   s|k(x)  &  (  (exp  (x) -A  (x)  )  7^  nullset)) 

A(x)  :=  A(x)  +  exp(x); 

end   while; 


Without  much  ado  (23)  can  be  rewritten  in  the  following  form 

(24)   {while    3xG  {yG  s|K(y)  &  [  +  :  toS  {zeexp(y)  |z^A(y)  IJIt^O}) 
A  (x)  :=  A  (x)  +  exp  (x)  ; 

end   while; 

suitable  for  differentiation.   The  other  three  cases  are 
handled  in  much  the  same  way. 

One  long  range  implementation  goal  is  to  extend  formal 
differentiation  so  that  it  can  apply  to  full  SETL  procedures. 
However,  before  we  are  able  to  recognize  the  continuity 
properties  of  full  procedures  and  to  differentiate  them 
automatically,  it  may  be  useful  to  augment  SETL  with 
declarations  which  state  a  procedure's  continuity  properties 
and  associated  derivative  rules.   This  extended   FD  capability 
would  facilitate  the  construction  of  large  modular  incre- 
mental programs;  e.g.,  incremental  metaparsers  continuous 
relative  to  slight  grammatical  changes  or  incremental  data 
flow  algorithms  continuous  with  respect  to  changes  in  local 
maps  and  the  control  flow  graph. 
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ii.   Various  Implications  and  Applications  of  Formal 
Differentiation 

There  are  many  interesting  side  issues  which  can  be 
pursued  in  parallel  with  the  main  implementation  goals  just 
mentioned.   Some  of  these  are  listed  as  follows: 

1.  The  most  general  algorithms  presented  in  this  thesis 
for  recognizing  dif ferentiable  expressions  and  performing  the 
actual  reductions  have  been  derived  from  the  heuristic 
notion  of  'continuity',  and  can  be  used  for  any  procedural 
programming  language.   To  implement  FD  for  a  language  P, 

we  must  select  either  the  automatic  or  semiautomatic  FD 
routines  described  in  Chapters  3  (C)  and  4(B),   and  must  also 
construct  the  FD  tables  which  encode  the  continuity  properties 
of  some  of  the  primitive  operations  of  P.  Thus,  it  seems 
both  feasible  and  worthwhile  to  design  FD  implementations  for 
various  high  le'\^l  languages  such  as  Snobol  and  APL. 

2.  It  is  somewhat  more  speculative  but  still  interesting  to 
apply  FD  in  a  simplified  data  base  context,  where  we  neglect 
issues  of  sharing, and  limit  storage  to  main  memory.  Our  basic 
idea  is  to  make  SETL  an  incrementally  compiling  and  inter- 
active language  so  that  directives  containing  SETL  source 
code  to  construct,  modify,  and  query  a  data  base  can  be  issued 
from  a  terminal.   These  directives  will  have  the  form, 

(25)  op,  setlsource 

where  op  is  an  operation  code,  either  'construct',  'query', 
or  'modify',  and  setlsource  is  a  block  of  SETL  source  state- 
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ments  restricted  in  the  following  way:  1.  Setlsource 
for  'construct'  operations  contains  statements  which  assign 
initial  values  to  certain  'elementary'  variables  of  the  data 
base.   2.  Queries  are  SETL  code  sequences  which  use  these 
'elementary'  variables  (without  modifying  them)  to  form 
expressions  which  can  be  printed.   2.  Directives  having  an 
operation  code  'modify'  involve  SETL  code  which  change 
'elementary'  variables  differentially;  i.e.,  in  a  way 
ensuring  that  these  variables  are  'inductive'. 

A  finite  sequence  of  SETL  source  statements  laid  out 
from  consecutive  directives  may  be  seen  as  forming  a  rather 
stylized  straight  line  SETL  program  P  having  a  character- 
istically high  degree  of  repetitive  code  (as  would  normally 
be  found  only  in  a  program  loop).   Thus,  we  can  anticipate 
opportunities  to  improve  P  by  application  of  various  peep- 
hole transformations  and     the  more  global  techniques  — 
redundant  code  elimination,  formal  differentiation,  and 
data  structure  selection. 

Unfortunately,  program  optimization  methods  depend  on 
the  availability  and  analysis  of  complete  programs.  And  in 
the  interactive  milieu   in  which  our  data  base  model  resides, 
only  a  part  P'  of  our  'program'  P  is  available  for  analysis  -- 
the  part  which  has  already  been  executed.   There  always 
remains  a  significant  unknown  portion  P"  of  P  formed  from 
directives  yet  to  be  issued.   Nevertheless,  we  expect  the 
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code  of  P  to  be  sufficiently  repetitive  that  properties  of 
P"  (especially  some  initial  segment  of  P")  can  be  predicted 
from  analysis  of  P'  (especially  some  final  portion   of  P'). 
Consequently,  it  seems  plausible  that  various  program 
optimization  techniques,  reformulated  in  a  minor  way  for 
run  time  use,  can  improve  the  processing  of  our  data  base 
directives . 

In  particular,  we  will  show  how  formal  differentiation 
might  be  used  to  optimize  queries.   Consider,  as  an  example, 
a  data  base  used  by  an  airline  company.   Suppose  that  the 
'elementary  variables'  of  this  data  base  are 

1.  A  set     Flights  of  flight  numbers; 

2.  A  map  Strt   associating  each  flight  n  s  Flights  with  a 
starting  location  Strt(n); 

3.  A  map  dest   associating  each  flight  n  6  Flights  with  a 
destination  dest  (n) . 

4.  A  map  Pass  associating  each  flight  n  with  a  set  Pass(n) 
of  passengers  scheduled  to  fly  on  flight  n. 

5.  A  map  Food  associating  each  flight  n  and  each  passenger 
p  s  Pass(n)  with  a  meal  selection  Food(n,p). 

Then  we  can  initialize  the  data  base  using  the  following 
directive , 
(26)   'construct'.  Read (Flights , Strt , dest , Pass ,  Food); 

Once  the  Read  statement  within  (26)  is  parsed,  compiled, 
and  executed,  we  can  issue  an  assortment  of  queries  which  use 
the  five  elementary  variables  just  defined.  Some  examples  are 
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(27)  'Query ', Print (#{feFlights I Strt(f)= 'New  York' 

&  dest  (f )  =  'Paris'  })  ; 

(28)  'Query' , Print (3pePass (142) | Food (p, 142 )= 'Kosher ' ) ; 
and 

(29)  'Query', 

Q  :=  {feFlights|3pGPass  (f )  |Food(p,f)  =  'Fish' } 
(Vf  e   Q) 

Print (f,  {p  e  Pass(f)  |  Food(p,f)  =  'Fish'}); 
end    V; 

To  ensure  that  the  five  elementary  variables  are  induc- 
tive, we  only  allow  code  within  'modify'  directives  to  change 
the  maps  strt,  dest,  and  food  by  indexed  assignments;  also, 
the  set  Flights  and  each  set  Pass(n)  for  a  given  flight  n 
may  only  vary  by  differential  set  additions  and  deletions. 

Most  of  the  techniques  needed  to  optimize  queries  have 
been  worked  out  in  previous  sections  of  this  thesis.  However, 
we  also  require  that  additional  techniques  mentioned  in 
phase  4  of  the  FD  implementation  proposed  in  the  preceding 
subsection  be  available.   Thus,  we  assume  that  FD  is  fully 
mechanized,  and  specifically  that  hidden  reducible  expressions 
can  be   recognized   before  they  would  be  exposed 
by  application  of  preparatory  transformations. 

We  now  illustrate  how  the  queries  (27)- (29)  might  be 
improved  using  formal  differentiation.  First  consider  the 
case  of  (27).   Once  the  Print  statement  appearing  in  (27)  is 
parsed,  we  will  recognize  two  reducible  expressions. 
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Cj^{q^,q2)  =  (f  e  Flights  |  strt(f)  =  q,  &  dest(f)  =  q  } 
and 

^2^^1'^2^  "  [+:  X  e  C^(q^,q2)]l 
Note  that  we  ignore  the  fact  that   'New  York'  and  'Paris' 
appearing  in  (27)  are  region  constants,  and  instead  treat 
(27)  as  containing  two  removable  discontinuities  q,  and  q2  . 
This  strategy  is  taken  in  anticipation   of  encountering 
other  reducible  expressions  differing   from  c,  only  with 
respect  to  the  bound  variable  f  and  constants  used    in  place 
of  the  parameters  q,  and  q2 •   All  such  expressions  would  be 
kept  available  by  reduction  of  c,  (cf.  the  rules   of  (36) 
of  Chapter  2  (D)).   It  is  also  practical  to  reduce  c,  under 
the  assumption   that  all  continuity   variables  Flights,  Strt, 
and  dest   can  be  modified. 

In  order  to  select  expressions  for  reduction  efficaci- 
ously, we  keep  track  of  all  'nonsimilar'  reducible  expressions 
(cf.,  (22)  of  Chapter  4(B))   and  their  frequency   of  occur- 
rence  within  some  finite  number  of  most  recently  executed 
queries.   In  the  case  of  c,  and  C2  ,  we  must  decide  for  each 
of  these  expressions  whether  it  is  similar  to  an  expression 
which  has  been  encountered  'frequently'.   If  this  is  the  case 
for  c,  and  if  c,  is  not  yet  reduced,  we  will  then  perform 
the  initialization  part  of  the  FD  transformation  for  c, .  If 
c   is  similar  to  a  reduced  expression,  we  can  replace  the 
expression  c,  in  the  query  parse  tree  by  the  appropriate  map 
retrieval  term.   After  examining  c-  in  the  same  way,  the 
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query  (27)  can  be  compiled  and  executed.   As  a  final  step, 
we  eliminate  the  reduced  forms  of  all  those  reduced  expres- 
sions whose  frequency  of  occurrence  has  become  too  low. 

After  C-,  and  C2  are  reduced,  directives  which  'modify' 
Flights,  strt,  or  dest,  will  trigger  execution  of  derivative 
code  to  maintain  c   and  c^. 

It  should  be  expected  that  heavy  use  of  the  airline 
data  base  just  described  will  very  quickly  establish  expres- 
sions c^  and  C2  as  occurring  frequently  enough  to  warrant 
reduction .   There  may  be  several  other  expressions  which 
will  require  a  longer  period  (to  be  discovered  by  the  user 
community)  before  they  stabilize  in  reduced  form.  At  the 
outset,  queries  for  such  a  data  base  are  likely  to  be 
executed  inefficiently.   Eventually,  however,  the  system 
can  be  expected  to  reach  an  equilibrium  state  in  which  the 
small  number  of  commonly  occurring  expressions  most  useful 
in  the  formation  of  queries  will  be  detected  and  reduced. 

We  also  expect  occasions  when  persistent  occurrences 
of  usually  rare  queries  will  trigger  temporary  query  optimi- 
zation. Consider  the  following  scenario.   Somewhat  by  chance, 
a  query  (28)  is  issued  to  inquire  whether  Kosher  food  must 
be  prepared  for  flight  142.   Such  a  query  establishes  uses 
of  two  reducible  expressions, 

C3(q2^fq2)  =  {p  G  Pass  (q^^)  |Food(p,q^)  =  q2 } 


and 


C4(qj^,q2)  =  [  +  :  X  e  c^(qj^,q2)]l 
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(cf .  ,  (28)- (30)   of  Appendix  F  for  reduction  of  c.)  . 
If  at  the  same  time  there  are  reports  that  certain  contami- 
nated fish  have  been  distributed  to  various  flights,  then 
the  probable  use  of  emergency  queries  such  as  (29)  reinforced 
by  occasional  queries  of  the  form  (28)  can  initiate  reduction 
of  c,  and  c.  (which  are  common  to  both  queries) .  Of  course, 
when  the  emergency  subsides  and  uses  of  c-,  and  c.    become 
rare, the  maps  holding  values  of  c,  and  c.  will  be  eliminated. 
3.    Another  area  of  research  involves  expanding  the  results 
of  Chapter  2 (D)  where  we  make  some  initial  estimates  of 
expected  speedup  which  can  result  from  application  of  FD 
to  SETL  expressions.   If  algorithm  speedup  and  space 
utilization  can  be  prediced   in  advance  of  successive  appli- 
cations of  formal  differentiation,  then  we  move  closer  to 
the  point  when  FD  can  actually  be  installed  safely  as  part 
of  a  conventional  compiling  system.   Of  course,  results  of 
this  kind  will  also  make  FD.  more  powerful  by  allowing 
us  to  make   better     choices   between   competing 
transformations . 

Studies  along  these  lines  might  also  lead  to  a  more 
powerful  use  of  stepwise  refinement   to  prove  space  and 
time  requirements  of  an  algorithm  in  addition  to  its  correct- 
ness.  For  this  purpose  we  would  apply  FD  transformations 
to  a  base  form  algorithm  to  determine  an  asymptotic  speed 
and  space  estimate.   We  would  then  apply  data  structure 
selection  transformations  to  obtain  closer  estimates  of 
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the  constant  factors  involved  in  the  analysis  of  the  final 
efficient  form  of  the  algorithm. 

4.    The  heuristic  notion  of  'continuity'  frequently  arises 
in      algorithm  design  and  analysis.   It  seems  highly 
worthwhile  to  study  this  property  further  with  regard  to 
more  data  structures  than  those  used  in  the  runtime  environ- 
ment of  SETL.   This  in  turn  will  shed  new  light  on  the 
continuity  properties  of  full  algorithms  (which  must  be 
implemented  using  data  structures  and  associated  operations). 
In  [Sch  6]  Schwartz  elaborates  on  this  thought. 

Something  close  to  the  heuristic  notion  of 
'continuity'...  seems  to  play  an  important  role 
in  algorithm  design.   In  [Sch4]   we  note  that  programs 
will  commonly  be  structued  as   nests  of  loops;  many  of 
these  loops  . . .  realize  some  set-theoretic  expression 
E  -    E(a)  by  applying  a  map  M  =  M^  repeatedly  until  E 
emerges  as  a  fixed  point  of  M.   The  efficiency  of 
programs  having  this  structure  can  often  be  improved 
by  noting  that  within  an  'outer'  loop  Lout  ^^ich 
contains  an  'inner'  loop  L-i_j^  producing  the  value  E(a), 
the  parameters  a   of  E(a)  are  varied   only  slightly. 
An  observation  of  this  kind  often  allows  one  to 
restructure  Lj_j^  for  efficiency  by  calculating  E  using 
its  available  previous  value,  which  calculation  can 
of  course  be  substantially  more  rapid  than  calculation 
of  E  'from  scratch'  would  be.   [Moreover,  a  speedup  of 
this  kind  may  still  be  realized  even  when  the   parameters 
a   do  not  vary  'slightly'  within  Lq^^  ,  if  they  can  be 
made  to  do  so  by  restructuring  Lq^^^,  ]...  This  line 
of  thought  makes  it  clear  that  an  algorithm  for  evaluat- 
ing  E  =  E(a)  will  be  of  particular  interest  if  it  has 
good  continuity  properties.   Suppose  for  example  that 
E(a)  is  calculated  as  the  fixed  point  of  a  transforma- 
tion M^ .   There  will  in  general  be  many  transformations 
Ma,M^  ,  Mg  ,  ...  all  of  which  have  the  value  E(a)  as 
fixed  point;   among  these  transformations  [we]  will  often 
be  particularly  interested  in  those  M^  for  which  the 

sequence  E(a),  M-(E(a)),  M-(E(a))),  ...   leads  after 

a  a 

comparatively  few  iterations  to  the  fixed  point  E(a) 

of  M-  (where  we  assume  that  the  parameter  values  a  and  a 

differ  only  slightly) .   This  line  of  thought  points  up  a 
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problem  area  in  algorithmic  analysis  which  has  not 
yet  been  explored  systematically. 

It  is  instructive  to  consider  one  or  two  cases  in 
which  algorithms  or  data  structures  having  useful 
properties  of  continuity  are  known  or  can  be  devised. 
First  consider  sorting,  and  the  problem  of  maintaining 
the  sorted  form  of  a  set  s  to  which  modifications  are 
continually  being  made  by  addition  and  deletion.  If 
there  are  n  elements  in  s,  the  bubble  sort  will 
correct  for  an  insertion  or  deletion  in  approximately 
n/2  steps.   However,  if  the  sorted  form  of  s  is  kept 
as  a  balanced  tree,  one  can  connect  for  an  insertion 
or  deletion   in  log  n  steps. 

Next  consider  the  minimum  min    of  a  set  s  of  integers. 
After  an  insertion  s  =  s  +  {x}  one  can  update  min    by 
executing 

min  =    if   X    It   min  then  x  else  min; 

and  after  a  deletion  s  =  s  -  {x}  by  executing 

min  =  if  X  ne  min  then  min  else  (sort    s)   (1). 

Since  in  many  situations  the  minimum  of  s  will  rarely 
be  deleted,  it  will  rarely  be  necessary  in  using  this 
procedure  to  generate  the  sorted  form  of  s.  On  the 
other  hand,  if  the  minimum  of  s  is  used  in  a  process,  as 
for  example  a  selection  sort,  which  invariably  deletes 
the  minimum,  then  one  wants  an  algorithm  which  has  good 

'worst  case'  rather  than  good  'typical  case'  continuity 
properties.   In  such  a  situation,  it  is  reasonable  to 
arrange  s  as  a  vector  v  =  tree    s   having  the  implicit 
tree  property,  i.e.  v(n)  <  v(2*n)  and  v(n)  <  v(2*  n+1). 
Then  the  minimum  of  s  is  necessarily  v(l),  i.e.  can  be 
expressed  as  [tree    s) (1) .   Note  that  in  approaching  the 
quantity  min    s      in  this  way,  we  have  essentially  factored 
the  function  min    into  the  product  of  two  functions,  of 
which  the  first,  tree,    is  continuous,   while  the  second 

(indexing)  can  be  performed  rapidly. 


In  addition  to  Schwartz's  observations  above  we  note 
that  sometimes  it  is  necessary  to  restructure  a  program  loop 
L  containing  uses  of  an  expression  e  in  order  to  exploit  the 
continuity  properties  of  e.   As  an  example  of  this,  consider 
a  loop 
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(30)  (Vx  e  s) 

•  •  • 

=  {y  e  T  I  f (y)  <  x} 

end   V 
containing  repeated  calculations  of 

(31)  c  =  {y  G  T|f (y)  ^  x} 

and  no  definitions  to  either  f  or  T.   Unfortunately  (31) 
cannot  be  reduced  within  (30)  since  x  is  not  an  induction 
variable  of  c.  Thus  the  net  cost  of  computing  c  within  (30) 
is  0(#s  X  #T) .   Note,  however,  that  by  selecting  x  values 
from  s   in  sorted  order,  we  can  make  x  inductive  and  can 
differentiate  (30)  using  the  method  (28*)  of  Chapter  2(D). 
The  cost  of  ordering  s  is  0(#S  x  log  #S) ,  while  the  expense 
of  keeping  c  available  within  (30)  after  reduction  is 
0(#T  X  log  #T  +  #S)  —  and  this  usually  represents  improve- 
ment . 

5.    Another  topic  relevant  to  formal  differentiation 
concerns  expressions  which  cannot  be  reduced  profitably. 
We  hypothesize  that  the  lower  bounds  in  running  time  of 
algorithms  whose  base  form  versions  depend  heavily  on  such 
expressions  will  be  predictably  high.   Substantiation  of 
this  hypothesis  might  then  lead  to  the  discovery  of  more 
general  relationships  between  the  lower  bounds  in  running 
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time  of  algorithms   and  the  continuity  properties  of  expres- 
sions used  in  their  base  form  rubble  versions. 

Of  course,  the  problems  just  mentioned  are  very  diffi- 
cult, since  finding  lower  bounds   for  a  particular  algorithm, 
and  determining  the  full  continuity  properties  of  a  single 
expression  e  are  often  hard  problems.   Note  that  while 
establishing  reducibility  for  e  can  be  achieved  merely  by 
discovery  of  some  profitable  derivative  code,  determining 
that  e  cannot  be  reduced  requires  the  much  more  difficult 
discovery  of  a  proof  that  no  profitable  derivative  code 
exists  at  all.   Proofs  of  this  sort  are  little  understood, 
however,  and  the  most  we  have  attempted  to  do  in  this  thesis 
is  provide  a  modicum  of  insight  by  considering  example 
expressions  whose   continuity  properties    may  be  determined 
in  a  reasonable  way. 

Among  the  various  expressions  examined  for  their 
continuity  properties,  there  arise  two  main  obstacles  to 
speedup  due  to  FD.   The  most  common  snag  results  from  the 
high  cumulative  cost  of  performing  inexpensive  update  opera- 
tions for  too  many  stored  calculations  of  an  expression 
involving  discontinuities.    Speedup  may  also  be  limited 
when  reduction   of  an  expression   e_   depends  on 
reduction  of  a  whole  chain  of  auxiliary  expressions  e-j^,...,e^ 
in  which  for         i  =  l,...,n,  e.  must  be  reduced  to  make 
reduction  of  e._,   profitable.   (Recall  that  this  situation 
arose  in  the  first  formulation  of  Rule  2,  cf . ,  p.  30l-) 
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Although  a  chain  of  this  sort  may  be  necessary  to  minimize 
the  update  operations  directly  involving  e„  ,  the  net  cost 
of  reducing  too  many  expressions  may  be  prohibitive.  In 
this  case,  an  alternative  -approach  which  minimizes  the 
number  of  auxiliary  expressions  at  the  cost  of    performing 
some  redundant  or  unnecessary  operations  is  preferred. 
(Rule  2  was  reformulated   in  this  way  on  p.  304.) 

The  preceding  ideas  can  be  illustrated  using  a  few 
relevant  examples  which  have  not  been  discussed  in  the 
previous  chapters.   Consider   the  set  union 

(32)  c^(x)  =  f(x)  +  t 

which  is  executed  repeatedly  within  a  program  loop  L. 
Suppose  that  within  L,  f  is  a  region  constant,  t  varies 
only  by  slight  set  additions  and  deletions,  and  x  is 
modified  ad    lib.       Suppose  also  that  f  is  the  edge  map  for 
a  directed  graph  having  Dom    f  as  its  vertices,  and  t  is  a 
subset  of  these  vertices.   Since  (32)  is  discontinuous 
relative  to  changes  in  x,  formal  differentiation  requires 
that  separate  calculations  of  (32)  be  stored  in   a  map  c-,  (x) 
for  each  x  s  Dom    f.   This  may  be  achieved  by  performing 
the  following  initialization  code, 

(33)  c,  :=    null  set; 

(Vy  G  Dom    f) 

c^(y)  :=  f (y)  +  t; 
end    V  ; 

on  entrance  to  L.    Then  within  L,  whenever  t  changes  by 
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t  :=  t  +  A,  we  can  execute  the  following  prederivative  code, 

(34)  (Vw  e  (A  *  t) ,  y  G  {u  g  Dom    f  ]  w  ^  f(u)}) 

,  .  with 
c,  (y)  7     w; 
1  "^   less 

end    V ; 

Examination  of  (33)  shows  that  the  iteration  count  N  of 
the  loop  L  must  be  greater  than  %Dom    f   or  else  the  original 
cost  of  the  repeated  calculation  (32)  will  be  less  than  (33). 
But  ignoring  the  cost  of  the  preprocessing  code  (33),  we  see 
that  the  expense  of  computing  the  prederivative  code  (34) 
is  at  least  proportional  to  the  cardinality  of  the  set 

(35)  ^2^^^  =  {u  G  Dom    f  |  w  ^  f(u)} 

For  FD  to  be  worthwhile  we  require  that  the  cost  of  computing 
(32)  which  is  proportional  to  #f(x)  +  #t   must  be  signifi- 
cantly  greater  than  the  cost  of  (35)  (which  is  0(#C2(w)). 
But  this  will  only  be  true  when  the  graph  f  is  almost 
completely  connected;  i.e.,  for  all  u  G  Dom    f  ,  f(u)  ~  Dom    f. 
Suppose  now  that  f  is  almost  completely  connected.  Then 
for  FD  to  be  profitable  we  must  reduce  (35)  which, 
fortunately,  can  only  be  done  profitably  when  f  is  almost 
completely  connected  (cf.,  p.  340).  To  reduce  (35)  we  need 
to  perform  the  following  initialization  immediately 
after  (33) : 
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(36  )  c^    :=    null  set', 

(Vy  e  Dom    f) 

c^Cy)  :=  {u  e  Dom    f  |  y  ^  f(u)}; 
end    V; 

Note  that  the  cost  of  computing  (36)  is  proportional  to 

2 
{Dom    f)  .   Consequently,  we  can  replace  the  costly  setformer 

{u  e  Dom    f  I  w  ^  f(u)}  occurring  within  (34)  by  the 

retrieval  c„ (w) . 

It  is  interesting  to  consider  how  to  handle  (32)  in 

the  following  more  restricted  context:   Suppose  that  t  is 

an  upper  bound  on  the  range  of  x  values  within  L  and  that 

t  only  varies  by  differential  set   deletions.   To  exploit 

this  new  situation,  we  can  replace  the  initialization  code 

(33)  and  (36)  by  the  following  more  efficient  code, 

( 37 )  c,  :=  nullset ; 

(Vy  e  t) 

c-^  (y)  :=  c^  (y)  +  t ; 
end'   V  ; 

c^  :=  nul Iset ; 
(Vy  e  t) 

c^  (y)  :=  {u  e  t  I  y  ^  f (u) } 
end    V  ; 

Within  L  whenever  t  is  modified  by  a  change  t  :=  t  -  A  we 
can  execute  the  prederivative 
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(38)  (Vw  e  (A  *  t) ,  y  6  c^ (w) ) 

c, (y)  less   w; 
end    V  ; 

which  is  less  expensive  to  compute  than  (34). 

Note,  however,  that  within  (38)   c_ (w)  can  contain 
elements  belonging  to   A  *  t,  so  that  unnecessary  update 
operations  for  c,  may  occur.   To  avoid  executing 
these  undesirable  operations,  we  can  perform  the 
following  prederivative  code  for  c^  , 

(39)  (Vw  e  (A  *  t),  y  G  {u  G  t  1  u  ^  f(w)}) 

c^  (y)  less   w; 
end    V; 

just  prior  to  (38).   But  by  introducing  (39),  an  additional 
expression , 

(40)  ^3  ^"^  "  (u  e  t  I  u  ^  f  (w)  } 

must  be  reduced. 

To  reduce  (40)  we  perform  the  initialization 

(41)  C-.    :=    nullset; 

(Vy  G  t) 

c^(y)     :=  {x  g  t  I  X  ^  f (y) }; 
end    V  ; 

immediately  after  (37).   Consequently,  the  setformer 

{u  G  t  i  u  ^  f (w) }   occurring  within  (39)  may  be  replaced 

by  the  retrieval  c-,(w).   However,  even  after  this,  the 
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fact  that  c^  depends  on  an  ever  decreasing  set  t  implies 
that  unnecessary  update   operations  of  c„  can  occur  within 
(39)  unless  c_,  is  first  updated   to  correct  for  this  problem. 
Unfortunately  the  required  update  calculation  is 

(42)  (Vw  e  (A  *  t)  ,  y  6  {u  G  t  I  w  ^  f(u)}) 

c-,(y)  less   w; 
end    V  ; 

which  can  also  include  unnecessary  updates  of  c  . 

Observe  that  a  continued  quest  to  eliminate  unnecessary 
operations  will  only  introduce  more  code  selected  from  the 
cycle  (38),  (39),  (42),  and  the  cumulative  expense  of 
computing  this  code  may  be  high.   Thus  it  seems  prudent  to 
avoid  this  approach  and  instead  be  satisfied  with  the 
imperfect   code  (42)  in  which  the  setformer  {u  e  t  |  w  ^  f(u)} 
is  replaced  by  the  retrieval  c_ (w) . 

The  preceding  example  suggests  that  workset  algorithms 
can  sometimes  reach  maximum  efficiency  by  actually  introduc- 
ing local  inefficiencies;  e.g.,  by  letting  these  algorithms 
depend  on  worksets  which  are  larger  than  absolutely  necessary. 

iii.  Summary 

We  have  now  presented  a  thesis  investigating  formal 
differentiation,  a  program  optimization  technique  which 
generalizes  and  reformulates  John  Cocke's  method  of  strength 
reduction,  and  provides  a  convenient  framework  with  which 
to  implement  Jay  Barley's  'iterator  inversion'.  Algorithms 
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to  implement  formal  differentiation  both  automatically  and 
semiautomatically  for  programming  languages  ranging  from 
Fortran  to  SETL   have  been  given.   However,  since  FD  seems 
best   suited  for  SETL,  we  have  studied  set  theoretic 
formal  differentiation  in  depth,  and  have  presented  a 
comprehensive  semiautomatic  implementation  design  for  a 
subset  of  SETL.   This  design  includes  an  interactive  source 
to  source  transformation  program  improvement  system  to  be 
used  for  performing  experiments  in  algorithm  derivation 
using  FD.   It  is  expected  that  empirical  evidence  gathered 
from  such  experiments  will  lead  eventually  to  a  full  mechani- 
zation of  FD  for  SETL. 

In  contrast  to  other  research  in  program  transformations, 
we  have  shown  that  FD  is  unusual  in  many  respects;  e.g., 

1.  It  may  be  applied  over  a  large  spectrum  of  language 
levels  and  in  wide  ranging  contexts  within  these  languages. 

2.  It  can  realize  swift  convergence  from  a  very  high  level 
inefficient  form  of  an  algorithm   to  a  much  lower  and  more 
efficient  implementation  version. 

3.  FD  can  be  implemented  systematically. 

4.  We  can  estimate  the  speedup  due  to  transformations  applied 
by  our  proposed  SETL  FD  implementation,  and  have  shown  this 
speedup  to  be  as  great  as  an  order  of  magnitude. 

We  have  illustrated  our  proposed  FD  system  for  SETL  by 
considering  and  improving  eight  sample  Subsetl  programs. 
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We  feel  that  these  initial  case  studies  lend  strong  support 
to  further  efforts  to  fully  automate  and  incorporate  set 
theoretic  formal  differentiation  as  part  of  an  optimizing 
compiler.   There  are  encouraging  indications  that  this  goal 
will  be  reached.   When  this  finally  happens,  it  will  repre- 
sent real  progress  towards  the  development  of  optimization 
algorithms  envisioned  by  Schwartz  "which  explore  spaces  of 
program  transformations  as  freely  as  a  manual  programmer 
does"  (Sch  6)  . 
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Appendix  A.   SETL  and  SUBSETL 

In  the  present  section,  we  summarize  the  principal 
features  of  a  version  of  the  SETL  language  used  throughout 
this  thesis.  Much  of  this  description  has  been  taken 
from  [Schl]   pp.  72-80  and  from  [DGS  2] .   However,  some  of 
the  latest  changes  occurring  in  the  official  SETL  language 
are  absent,  and  we  do  not  regard  this  precis      as  a 
reference  to  the  current  standard  SETL  language. 

As  the  name  suggests,  SUBSETL  is  a  subset  of  the 
version  of  SETL  described  in  this  section.   We  use  an 
asterisk  to  the  left  of  an  item  to  denote  a  SETL  feature 
not  included  in  SUBSETL. 

Basic  Objects 

Sets    and  atoms;    sets  may  have  atoms  or  sets  as  members 
Atoms  may  be 

Integer  Examples:   0,  2,  -3 

*  Real  Examples:   9.,  0,9,  0.9E-5 
Boolean  Examples:  True,    False 

*  Character  strings  Examples:   'aeiou',   'spaces' 

*  Blank  (created  by  function  newat)   . 

(Note:  special  undefined  blank  atom   is  Q..) 
Basic  Operations  for  Atoms 
Integers:   arithmetic:  +,  -,  *,  /,  **,  //  (remainder) 
comparison:  =,  "1=,  <  ,  >  ^  1.  '  ^ 
(Results  are  True,    False,    or  Q.) 
other:       max,    min ,    abs 
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Examples:    5//2  is  1;   3  max    -1  is  3;  abs    -2    is  2. 

*Reals:   Above   arithmetic  operations  (with  exception  of  //) 

plus  exponential,  log,  and  trigonometric  functions. 
*Booleans:   logical:  &,  or,    exor,    implies  ,    "1 

(Any   value  other  than  True    is  considered  False.) 
logical  constants:  True,    False  > 

*Strings:    +  (catenation),  *  (repetition),  a(i:j)  (extrac- 
tion), #  (size),  nullahar    (empty  strings). 

Examples:   'a'  +  'b'  is  ' ab ' ;  2  *  ' ab '  is  'abab', 
•abc'(l:2)  is  ' ab ' ,   •abc'(2:2)  is  'be', 
'abc'(2)  is  'b',   #  ' abc '  is  3,   #  nullahar    is  0. 
General:   Any  two  atoms  may  be  compared  using  =  or  n=; 
*  atom        a   tests  if  a  is  an  atom. 

Basic  Operations  for  Sets 

G,  ^  (membership  tests);  nullset       (empty  set), 

B    (arbitrary  element) ,   #  (number  of  elements) ; 

=  ,  n=  (equality  tests)  ;  inos       (inclusion  test)  ; 

with,    less       (additionand  deletion  of  element); 

pow(a)   (set  of  all  subsets  of  a); 
*  npow(k,a)  (set  of  all  subsets  of  a  having  exactly  k  elements) 

Examples:   a  s  {a,b}  is  True,    a  G  nullset    is  False 

B   nullset    is    ^;      3    {a,b}    is    either    a   or    b,    #{a,b}    is    2, 

#    nullset    is    0,       {b}    with    a    is    {a,b},       {a,b}    less    a    is   {b}, 

{a,b}    less    c   is    {a,b},      {a,b}    inos    {a}    is     true 

pow({a,b})     is    {nullset  ,{&}  ,{h]  ,[a,h)}  . 

npow(2,{a,b,c})     is    {{a,b},{a,c},{b,c}}. 
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Tuples 

Ordered  tuples    are  treated  as  SETL  objects  of  different 
type   than  sets  —  e.g.   tuples  may  have  some  components 
undefined. 
Operations  on  tuples: 

^,  e  (membership  tests);  nulltuple     (empty  tuple); 

3  (arbitrary  element) . 
Tuple  former:   If  x,y,...,z  are  n  SETL  objects  then 

t  =  [x,y,...,z]  is  the  n-tuple  with  the  indicated  components. 

#t   is  the  number  of  components  of  t 

t(k)  is  the  k-th  component  of  t 

t(i:j)   is  the  tuple  whose  components,  for  l5_k<_j  ,  are 
t(i+k-l) 

+   is  the  concatenation  operator  for  tuples 

Examples:   a(n)  G  t    is  an  abbreviation  for  3  l<_n<_#t  1 1  (n)  =a 

If  t  =  [a,b]  and  t  =  [a,c]  then 

T  =  t  +  T=  [a,b,a,c]  ,   T(3:2)  =  [a,c] 

Tuple  components  may  be  modified  by  writing   t(j)  =  x; 

An  additional  component  may  be  concatenated  to   t 

by  writing   t(#t   +1)   =  x; 
Set  Definition 

By  enumeration:  {a,b,...,c}. 

Set  former:  {e(x,,,..,x  ):  x,ee,  ,X2Ge2  (x,  )  ,  .  .  .x  Ge^  (x,  ,  .  .x   -j^) 

I  c(x-^,  .  .  .  ,x^)  }  . 

Tuples  may  also  be  defined  by  analogous  tuple-formers, 

[e  (x^,  ,  .  .  ,x^,mj^,  ,  .  .  ,m^)  :  x^  (m^)Ge-|^, 

.  .  .  ,  ^n^'^n^^^n^'^l'  *  "  '  '^n-l'^^i'  •  '  '  '%-l^ 
I  c(x^,  .  .  .  ,x^,mj^,  .  .  .  ,m^)  ] 

523 


The  range    restrictions    x  e  a(y)  can  have  the  alternate 

numerical    form 

min(y)  ^  x  <_  max(y) 

when  a(y)  is  an  interval  of  integers. 

If  t  is  a  tuple,  the  form  x(n)  €  t  can  be  used,  see 

below,  iteration    headers ,    for  additional  detail. 

Optional  forms  include 
{x  e  a  I  C(x)    equivalent  to  {x:  x  e  a  ]  C(x)};   and 
{e{x) :  x  6  a}    equivalent  to  {e(x) :  x  s  a  |  True]    . 

Functional  Application  (of  a  set  of  ordered  pairs,  or  a 
programmed,  value-returning  function): 
f{a}  is  {if  #p  >  2  then  p(2:)  else  p(2),  p  G  f  ] 

if  type    p  n=  tupl    then  False    else  (#p)  ^  2  &  p(l)  =  a}, 

i.e.      is    the    set   of    all   x   such   that    [a,x]    e    f    . 
ffa)    is:    if    #f{a}    =    1    then     3  f  { a }    else    9., 

i.e.,  is  the  unique  element  of  f{a},  or  is  undefined  atom, 
f[a]   is  the  union   over  x  s  a  of  the  sets  f{x}, 

i.e.,  the  image    of   a   under   f. 
More  generally: 

f(a,b)       is      g(b)    and    f{a,b}    is   g{b},    where   g    is    f{a}; 
f[a,b]       is    the    union   over   x   s    a   and   y   s   b   of    f{x,y}. 
If    f    is    a   value-returning    function,    then 

f{a,b}    =   {f(a,b)},      f[a]    =   {f(x):    x   e   al,      etc. 

Constructions  like  f{a,[b],c},  etc.   are  also  provided. 
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Compound  Operator 

[op:    X  G  s]e(x)  is   e(x-.)  op    e(x_)  op    ...  op    e(x  )  , 

where  s  is  {x,,...,x  }. 

This  construction  is  also  provided  in  the  general  form 

[op:    X  e  e,  ,  X  e  e„(x  ),...,x  e  e  (x  ,...,x^  ., )  | 
L  L         Z  Z       L  nnl       n-i   ' 

C(x^, . . . ,x^) ]e (x)  , 

where  the  range  restrictions  may  also  have  the  alternate 

numerical  form,  or  the  form  appropriate  for  tuples. 

Examples:   [wax:    x  e  {1,3,2}]  (x+1)   is   4, 

[  +  :    X  e  (1,3,2  }]  (x+1)   is   9, 

n 
[+:  x(n)  €  a]x   is   SETL  form  of    la 

i  =  l 
[op:    X  G  nullset]e  (x)  is  Q. . 

Quantified  Boolean  Expressions 

3x  e  a  I  C(x)  Vx  e  a  |  C(x) 

General  form  is 
3  x^  e  a^,  x^  G  a2(x^),  Vx^  g  a^  (x^,X2)  ,  .  .  .  |  C  (x^^ ,  .  .  .  ,  x^) 

where  the  range  restrictions  may  also  have  the  alternate 
numerical  form,  or  the  form  appropriate  for  tuples. 
Evaluation  of 

3x  e  a  I  C(x) 
sets  x  to  first  value  found  such  that  C (x)  =  True. 
If  no  such  value,  x  becomes  fi. 
The  alternate  forms: 

3min  ^  x  <_  max,  3  max  ^  x  ^  min,  3  max  ^  x  >  min,  x(n)  e  t,  etc, 

of  range  restrictions  may  be  used  to  control  order  of  search. 
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Conditional  Expressions 

if  bool,  then  expn,  else  if  bool„  then  expn„  . . .  else  expn 

Statements :   are  punctuated  with  semicolons. 

Assignment  and  Multiple  Assignment  Statements 

a  :=  expn;    f{exp}  :-   expn;   is  the  same  as 

f  :=  {p  e  f  I  p(l)  n=  exp}  +  {  [exp,x]:  x  e  expn}; 
f(exp)  :-    expn;    is  the  same  as    f{exp}  :=  {expn}; 
f(a,b)  :=  expn;   f{a,b}  :=  expn;   etc.   also  are  provided. 

*  [a,b]  :-   expn;    is  the  same  as    a  :=  expn(l);  b:=  expn ( 2 : ) ; 

*  [a,b,...,c]  :=  expn;  [a,[b,c],...,d]  :=  expn;  etc. 

are  also  provided. 

*  [f(a),  g{b}]  :=  expn;   is  the  same  as   f(a)  :=  expn(l); 

g{ b }  : =  expn ( 2 : ) ; 
Generalized  forms: 

*  [f(a),  g{b,c},...,h(d)]  :=  expn; 

*  [f(a),  [g{b,c} ,h (d) ] ,  ...,k(e)]  :=  expn;  etc.  also  are 

provided. 

*  Use  of  General  Expressions  on  Left-Hand  Side  of 
Assignment  Statements  (sinister  calls) . 

e(x, ,...,x  )  :=  y;   must  be  no-op  if  executed  immediately 
after  y  :=  e(x, ,...,x  );  and  vice-versa.  The  use 

op    op '    X  :-   y ; 

of  a  product  operator  on  the  left-hand  side  of  an 
assignment  expands  as 
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t  :=    op '    x; 
op    t  :=  y; 
op  '    X  :=  t  ; 

with  sinister  rules  for  multiparameter  compounding.  These 
rules  allow  user-defined  functions  to  be  used  quite 
generally  on  the  left-hand  side  of  assignment  statements. 
The  'left  hand'  significance  of  a  function  is  often  deducible 
from  its  standard  right-hand  side  form,  but  may  be  varied 
by  using  specially  designated  code  blocks  which  are  executed 
only  if  the  function  is  called  from  right-hand  or  left-hand 
position   respectively.    These  have  the  respective  forms: 
(load)  block;     (execution  only  if  function 

called  from  right-hand  side  is 

assignment) 
(store  x)  block;   (execution  only  if  function  f 

called  is  from 

f (param  , . . . ,param  )  :=  x;)  . 

*  Commonly  Used  Operators  Having  Special  Side  Effects 

X  :=  expn;   has  same  value  as  expn  and  assigns  this 

value  to  X 
x  in    s;      same  as    s    :=    s   with    x; 
X  from    s;    same  as    x  :=3s;    s  :=s  less    x; 
X  out    s;     same  as    s  :=  s  less    x; 
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*  Use  of  Code  Blocks  Within  Expressions 

If  block    is  a  section  of  text  which  could  be  the  body 
of  a  function  definition,  then  [;  block]  is  a  valid  expres- 
sion which  both  defines  and  calls  this  function.   Such 
code  block  expressions  can  be  used  freely  within  other 
expressions . 

Control  Statements 


*  go  to   label; 

if  cond,  then  block,  else  if  cond„  then  block2 .  .  .else  block  ; 

if  cond,  then  block,  else  ...  else  if  cond   then  block^; 
11  n  n 

Iteration  Headers 

(while  cond)  block; 

*  (while  cond  doing  blocka)  block; 

(Vx^  e  a^,  x^   6  a^ix^)  ,  .  .  .  ,x^   e  a^{x^,  .  .  .  ,x^_-^)     | 

C (x, , . . . ,x  ) )  block; 
in  this  last  form,  the  range  restriction  may  have  such 
alternate  numerical  forms  as 

min  ^  X  <_   max  ,   max  ^  x  >^  min  ,   min  ^  x  <  max  ,  etc.  , 

which  control  the  iteration  order. 

If  t  is  a  tuple,  then  the  operator  of  form 
(Vx(n)  e  t)  block;  is  available.  This  is  abbreviation  for 
(VI  ^  n  ^  #t  I  t(n)  1=   ^)    X   =    t(n);   block; 

Iterators  of  this  form  may  also  be  used  in  set  formers, 
compound  operators,  quantifiers,  etc. 
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Iterator  Scopes 

The  scope  of  an  iteration  or  of  an  else      or  then   block 
may  be  indicated  either  with  a  semicolon,  with 
parentheses,  or  in  one  of  the  following    forms: 

end  V;    end  while;    end  else;    end  if;    etc., 
or  : 

end  Vx;   end  while  x;  end  if  x;  etc. 

*  Loop  Control 

quit;    quit  Vx;    quit  while;    quit  while  x; 
and 

continue;   continue  Vx;   continue  while;   continue  while  x; 

The  quit    statement  terminates  an  iteration;  the  continue 
statement  begins  the    next  cycle  of  an  iteration. 

*  Subroutines  and  Functions  (are  always  recursive) 

To  Call  Subroutine: 

sub(param, , . . . ,param  ) ; 
■  sub[a];   is  equivalent  to   (Vx  e  a)  sub(x);; 

Generalized  Forms: 

sub  (par  am,  ,  [par  am  „  ,  par  am,,]  ,  .  .  .  ,param,  ) 
are  also  provided. 

*  To  Define  Subroutines  and  Functions 

Subroutine : 

define  sub(a,b,c);  text  end  sub; 
return;   —  used  for  subroutine  return 
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Function : 

definef  fun(a,b,c);   text   end  fun; 

return  val;   —  used  for  function  return 
Infix  and  prefix  forms: 


define   a  infsub  b 

definef  a  infin  b 

define    prefsub  a 

definef  prefun  a 


text  end  infsub; 

text  end  infin; 

text  end  prefsub: 

text  end  prefun; 


*   Namescopes 

Scope  declarations  divide  a  SETL  text  into  a  nested  collec- 
tion of  scopes .   Scope  names  are  known  in  immediately 
adjacent,  containing,  and  contained  scopes.   Other  than 
this,  names  are  local  to  the  scope  in  which  they  occur, 
unless  propagated  by  include   or  global    statements. 

Declaration  forms 

scope    name;  ...,  end  name; 
scopes  with  specified  numerical  level 

soope    n  name;  ...,  end  name; 
global  declaration 

global    name, ,  . . . ,  name  ; 
with  specified  numerical  level 

global    n  name,,  ...,  name  ; 
include  statement 

include    list,,  ...,  list 
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Example : 

*  include    bigscope 1 (scope  1  x, scope2 (-z)  ,scope3 (x,y , u [v] ) )  , 

bigscope2* ; 
'*'  signifies   all  elements  in  scope, 
'-'  signifies  exclusion  of  those  elements  listed, 
[  ]  modifies  the  'alias'  under  which  an  element  is  known 
in  scope  in  which  included.  Subroutines  and  functions 
are  scopes  of  level  0.   Macros  (see  below)  are 
transmitted  between  scopes  in  much  the  same  way  as 
variable  names.   The  declaration 

owns    routname,  (x,  , .  .  . ,x  , )  ,  routname„ (y, , . . . ,y  ^)  ,     ... 
11     '  nl  2-^1      -^  n2 

states  that  the  variables  x.      are  stacked  when  routname ^ 

is  entered  recursively,   the  variables  y.      are  stacked 

when  routname  ^    is  entered  recursively,  etc. 

*  Macro  Blocks 

To  define  a  block:  macro    mac(a,b);   text  endm      mac; 
To  use :  mac (c ,d) ; 

*  Initialization 

initially  block;  {block    executed  only  first  time 
process  entered) 

*  Input-Outpu't 

Unformatted  Character  String: 

A  SETL  file  is  a  pair  [st,n] ,  where  st  is  a  character 
string  and  n  an  integer  designating  1  of  its  characters. 
er    is  end  record  character;  input,    output    are  standard 
I/O  media;  the  function  record(s);  —  reads  a  file  [st,n] 
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from  position  n  until  ev    character  or  string-end  is 
encountered  in  the  character  string  st  . 

*    Standard  Format  I/O 

f  read   name, ,... .name  ;   using  standard  format  reads 
1  n        ^ 

from  file  [st,n] ,   starting  at  position  n 

f  print    expn, , . . . ,expn  ;   using  standard  form  transfers 

external  representation  of  objects  to  file  s  =  [st,n], 

starting  at  position  n  as  above. 

The  set  {s,,,..,s  }  is  represented  as  {r-,,...,r  }, 
In       '^  In 

where  r .  is  the  external  representation  of  s . . 

Similarly,  the  tuple  [s,,.,.,s  ]  is  represented 

as  [r^, . , . ,r^] . 
If   str  is  a  character  string  identical  with  the 

external  name  under  which  a  file  is  known  to  the 

operating  system  supporting  SETL ,  then 

open    str   returns  a  pair   [st,l]  , 
where  st      is  the  contents  of  the  file  str. 

close (st, str) 

makes  the  SETL  string  st      into  the  contents  of  the 
external  file  str. 
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The  following  grammar  for  SUBSETL  is  a  slight  modifica- 
tion and  subset  of  the  SETLA  grammar  found  in  SETL   Newsletter 
#  70,  pp.  31-34.   The  notations  used  are  the  same  as  in 
that  newsletter  and  are  as  follows: 


<Stype> 

<*Ltype> 

Literal,  'Literal' 

<StYpe*> 

<Stype (M,N) > 


Lexical  Types : 

<  *Name> 

<  *Opname> 

<  *Qname> 


<  *Logop> 


Denotes  a  syntactic  type 
Denotes  a  lexical  type 
Denotes  literals 
Denotes  indefinitely  many 
repetitions  of  a  syntactic  type 
Denotes  a  minimum  of  M  and  a 
maximum  of  N  repetitions  of  a 
syntactic  type . 

variable  name 

operator  {-^  ,-,*,/ ,max  ,min  ,..  .) 

period  terminating  constant 

{True , False ,nullset ,nullchar , 
nul Ituple    . . . ) 
Logical  operators 

(>,  =,  ~1=,  >,  Inos ,     ...) 
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SUBSETL  GRAMMAR 

<program>  =  < declaration  *>  <Block> 
<declaration>  =  < type> (< varlist> ) ; 
<varlist>  =  <*name>  <comname*> 
<coinname>  =  ','  <*  name> 
<type>  -    integer    \    boolean    \     tuple    \    set 

<block>  =  <statement (1,*) > 

<statement>  =  if   <expn>  then    <block>  <elseif*>  else 

<block>  endif; 

=  if   <expn>  then   <block>  <elseif*>  endif; 

=    V<iterator>  <block>  end    V ; 

-  {while   < expn>) <block>  end   while; 
=  < term>  ' :-'    <expn>; 

< term>      -   < *name> 

-  <*name>   <arglist> 
<arglist>         =    (expnlist) 

=    {expnlist  } 

<expnlist>  =   <expn>    <coinexpn*> 

<compexpn>  =    ' , '    < expn> 

<elseif>  =    elseif  <expn>    then   <block> 

<iterator>  =  <  f  irstit  (0  , 1)  ><  iterlistx  lastit  (0  , 1)  > 

<firstit>  =   <expn>',' 

< lastit>  =  |<expn> 

<iterlist>  =  < iterexpn>    <comiterexpn*> 
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<iterexpn>      =  <*name>    e  <expn> 

=  <  expnx  compareopx  *name><  coinpareop><expn> 
<compareop>    =    > 


_    I  _  I 


=   < 


=    >    '  = 


<comiterexpn>    =    ','    <iterexpn> 
<expn>  =   < factorx *opname><expn> 

=  <factor> 

=   if  <expn>    then   < expn>< elsexpn    *>    else   < expn> 
<elsexpn>         =   else    if  <expn>    then   <expn> 
<factor>  =   < *opname>< factor> 

=    3    <iterlist>   < lastit (0 , 1) > 

=   V  <iterlist>   < lastit (0 , 1) > 

=    [<*opname>:    <iterlist>   < lastit (0 , 1) >] <expn> 

=  < atom>   <arglist> 

=  <atom> 

=  < atom>  <*logop>  <factor> 
<ATOM>       =  <term> 

=  (<expn>) 

=  {<expnlist>} 
=  {<iterator>} 
-  [<expnlist>] 
=  [<iterator>] 
=  <*const> 
=  <  *Qname> 
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Appendix  B.   Predicting  Speedup  for  Rule  1 

In  the  following  Appendix  we  will  continue  the  discus- 
sion raised  in  Chapter  2  (C)  concerning  the  difficulties 
in  predicting  program  speedup  which  results  from  formal 
differentiation.   To  illustrate  our  point  we  will  consider 
the  rule  1   reduction  transformation  applied  to  the 
setformer 

(1)  c  =  {x  e  s  I  K(x)  } 

executed  repeatedly   in  a  program  loop  L  (cf .  Chapter  2(C))  . 

In  this  case  we  can  facilitate  prediction  of  program 

speedup  by  restricting  our  attention   to  an  asymptotic 

speed  complexity  in  which  the  frequency   of  execution  of 

a  program  loop  L  is  arbitrarily  large  in  comparison  to  the 

initialization  block  to  L.   This  optimistic  heuristic 

obviates  the  need  to  consider  the  cost  of  any  code  inserted 

within  an  initialization  block  to  L  by  rule  1.   Another 

possibility  is  to  take  the  somewhat  pessimistic  view  that 

within  L   control  will  always  travel  along  a  path  containing 

a  maximal  number  n   of  prederivatives  c  =  c+{xeA^  |k(x)}, 

...,c=c+{x£A  |k(x)}   between  any  two  calculations 
—       n 

of  (1).   Then,  after  waving  aside  numerous  complications,  we 
can  formalize  a  condition  under  which  we  expect  rule  1  to 
result  in  improvement:  For  the  cost  of  calculating  (1)  to 
exceed  the  maximum  cost  of  computing  the  update  code  in  L  , 
the  following  inequality 
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n 

(2)  I       #A.  Cost(K)  <  #s  Cost(K) 

i  =  l    ^ 

which  simplifies  to  an  equivalent  'win  predicate' 

n 

(2')  I      #A.  <  #s 

i^l    ^ 

must  hold. 

The  present  thesis  will  not  attempt  to  provide  a 
method  for  static  determination  of  (2').   Although  we 
can  factor  out  the  term  Cost(K)  from  the  win  predicate 
for  the  rule  1  case,  a  more  general  complexity  measure 
cannot  be  avoided  in  analyzing  more  complicated  examples 
of  interest.   Hence,  the  current  thesis  will  steer  clear 
of  such  serious  and  difficult  complexity  issues  to   favor 
a  more  heuristic  approach  having  broad  practical  applica- 
tions . 
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APPENDIX  C 
FORMAL  DIFFERENTIATION  TABLES 

In  this  appendix  we  provide  the  pattern  tables 
which  support  the  various  formal  differentiation  implementa- 
tion designs  discussed  in  Chapters  3  and  4.   In  particular, 
Section  (i)  contains  the  elementary  form  table  (F)  and 
the  derivative  table  (D)  used  in  connection  with  the 
Fortran  variants  of  Algorithm  1-2,  Algorithm  1,  and 
Algorithm  2  (cf..  Chapter  3  (C.2.3,  C.3.2)).   Section  (ii) 
gives  the  F  and  D  tables  needed  for  our  algorithms  imple- 
menting set  theoretic  FD  limited  to  expressions  continuous 
in  all  of  their  parameters  (cf..  Chapter  3  (C.2.3,  C.3.2, 
C.3.3)).   Finally  Section  (iii)  provides  the  F,  D,  Init, 
and  Replace  tables  to  be  included  as  part  of  the  more 
general  SETL  FD  implementation  design  proposed  in 
Chapter  4 . 

The  entries  in  all  of  the  tables  to  be  presented 
consist  of  pattern  specifications       used  for  either 
pattern  matching  or  macro  expansion  operations.  To  specify 
pattern  expressions  we  will  make  use  of  literal  symbols, 
pattern  variables,  pattern  names,  and  names  of  code  proce- 
dures, all  combined  by  balanced  parentheses   and  operations 
of  concatenation,  alternation,  and  predecessor  formation. 
However,  to  make  these  tables  more  readable  we  will  avoid 
cluttering  up  patterns  with  explicit  tree  structure,  and 
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will  instead  assume  that  this  structure  can  be  determined 
by  an  appropriately  modified  Parser.   Other  notational 
shortcuts  taken  for  the  sake  of  clarity  will  be  mentioned 
as  we  proceed. 

The  following  BNF  rules  describe  the  notational 
details  of  our  nonprocedural  pattern  language: 

<ASSIGN>  ::=  <Pattern  name*>  '='  <Pattern  expression> 
<Pattern  expression>  : :=  <term>  | 

<Pattern  expression>  ' | '  <term> 
<term>  : :=  <factor>  | 

<term>  <factor> 
<factor>  : :=  <literal*>  | 

<Pattern  variable*>  | 

<Pattern  mame*>  | 

<Procedure  name*>  | 

[<Pattern  expression>]  | 

(<Pattern  expression>) 

The  lexical  categories  of  the  above  grammar  are  defined  as 
follows : 

<Pattern  name*>  —  an  alphanumeric  string  beginning  with 
a  letter;  each  pattern  name  must  appear  once  on  the 
left  side  of  an  assignment. 

<Literal*>  --  a  string  enclosed  in  quotes; 

<Pattern  variable*>  —  an  alphanumeric  string  beginning 
with  a  letter;  these  identifiers  can  be  distinguished 
from  pattern  names  in  that  they  cannot  appear  on  the 
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left  side  of  an  assignment. 
<Procedure  name*>  --  an  alphanumeric  string 
beginning  with  the  symbol  ! ; 

i.    FD  Tables  for  Fortran 

The  following  tables  shown  in  abbreviated  form  can 

be  used  in  connection  with  Algorithm  1-2  of  Chapter  3(C,2.2) 

tailored  to  Fortran  (cf.,  Chapter  3  (C,  2.3)).   For 

Fortran  FD  only  two  tables,  F  and  D,  are  required.  Each  of 

the  three  patterns  f  belonging  to  F  appears  as  part  of  a 

term  E  =  f.   We  use  this  notation  to  indicate  that  the 

pattern  variable  E  will  match  the  generated  variable  v_  used 

f 
to  hold  the  value  of  a  reduced  expression  f  matched  by  f. 

The  D  table  is  lined  up  with  the  F  table  so  that 

derivative  entries  associated  with  each  elementary  form  f 

are  listed  just  to  the  right  of  the  entry  for  f.   As  a 

notational  shortcut,  we  sometimes  specify  two  derivative 

entries  on  a  single  line;  e.g.,  the  line 

xl  =  +  x3  E  =  +  x3  *  x2 

indicates  the  following  two  entries: 

xl  =  +  x3  E  =  +  x3  *  x2  . 

and 

xl  =  -  x3  E  =  -  x3  *  x2 
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Elementary  Form  Table  (F)  Derivative  Table  (D) 

Parameter  Change    Prederivative 
1.  E  =  xl  *  x2  xl  =  +  x3      E  =  +  x3  *  x2 

xl  =  -  x3+x4  E  =:  -  x3*x2  +  x4*x2 
xl  =  x3+x4  E  =  x3*x2  J:  x4*x2 
x2  =  +  x3      E  =  +  x3  *  xl 


x2  =  -  x3+x4   E  =  -  x3*xl  +  x4*xl 


x2  =    x3+x4   E  =    x3*xl  +  x4*xl 


2.  E  =  xl  /  x2  xl  =  +  x3  E  =  +  x3/x2 

xl  =  -  x3+x4  E  =  -  x3/x2  +  x4/x2 

xl  =    x3+x4  E  =    x3/x2  1  x4/x2 

3.  E  =  xl  **  x2  x2  =  +  x3  E  =    xl**±x3 

x2  =  -  x3+x4  E  =  (xl**-x3) * (xl**±x4) 

x2  =    x3+x4  E  =  (xl**x3) * (xl**±x4) 

ii.   FD  Tables  for  Subsetl  Expressions  Continuous  in  All  of 
Their  Parameters 

The  F  and  D  tables  shown  below  can  be  used  with  the 
Subsetl  variants  of  either  Algorithm  1.2  (of..  Chapter 
3  (C.2.3))  or  Algorithm  1  and  Algorithm  2  (of.  Chapter  3 
{C,3.2,  C.3.3)).   The  form  of  these  tables  is  the  same  as 
the  tables  just  given  for  Fortran. 
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F  Table 


D   Table 


Parameter  Change 


1.  E  =  xl+x2 
set  union 


xl 
xl 
x2 
x2 

2.  E  =  xl*x2    xl 
intersection  x2 

3.  E  =  xl-x2    xl 
set  differ- 


ence 


x2  : 
x2  : 
4 .  E  =  Pow  (xl)  xl  : 


=  xl  +  A 

=  xl  -  A 

=  x2  +  A 

=  x2  -  A 

=  xl  +  A 

=  x2  +  A 

=  xl  +  A 

=  x2  +  A 

=  x2  -  A 

=  xl  +  A 


xl  :=  xl  -  A; 


5.  E={xexl|k}   xl  :=  xl  +  A; 


fj^  (yl,  .  .  .  ,yn)  :=z; 


postderivative 


E 
E 
E 
E 
E 
E 
E 


Prederivative 

=  E  +  A; 
=  E  -  (A  -  x2)  ; 
=  E  +  A; 
=  E  -  (A  -  xl) ; 
=  E  +  A  *  x2; 
=  E  +  A  *  xl; 
=  E  +  (A  -  x2)  ; 


E  :=  E  -  (A  *  xl)  ; 
E  :=  E  +  (A  *  x2)  ; 
E  :=  E+ 

{y+z :yGE, zSPow (A-x)  }; 
E  :=  E- 

{yGE|  A*y^=/l/uUset}; 
E  :=  E  +  {x  e  A  I  k}  ; 


^0  =  = 


r    n 


{xexl|  or{    &  P.  .  (x)=y.  )  }  ; 
i=l  j^i  ^3      J 


E:=   E-{xGs   |k}; 


-^{E:=   E+{xesQ  |k}; 


6.  E=[op ixGxl |kl]k2  xl  :=  xl+A;      E  :=  E  op [op : xG ( A-xl )  |  kl ] k2 ; 


where  op    is  any 
binary  operator 

7.    E=[+:xGxl|kl]k2       xl     :=    xl -A ; 


where  +  is 
addition 


E  :=  E- [+:xG (A*xl) |kl]k2; 
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F  Table 


Parameter 
Change 


Prederivative 


8.  E  =  {kl:  xexl|k2}  xl  :=  xl+A;   E  :=  E  +  {kl:  x  e  A  |  k2}; 

9.  E  =  {xl<_x<_x2  |k}    xl  :=  xl+A;  if   A  ^  0  then 

E  :=  E-{xl£x< (xl+A  min    x2)|k}; 

else 

E:=E+{xl+A^x< (xl  min    x2)|k}; 
endif; 
xl  :=  xl-A;   t/  A  ^  0  then 

E:=E+{xl-A£x< (xl  min    x2)|k}; 

else 

E:=E-{xl<^x<  (xl-A  min    x2)|k}; 
endif ; 
x2  :=  x2+A;  if   ^    -^   Q    then 

E:=E+{(xl  max    x2 ) <x£x2+A | k } ; 

else 

E  :=E-{(xl  max    x2+A ) <x^x2 | k } ; 
endif; 
x2  :=  x2-A;  if    h    >_   0    then 

E:=E-{(xl  max    x2-A ) <x^x2 | k } ; 

else 

E:=E+{(xl  max    x2 ) <x£x2-A | k} ; 
endif; 


rmin 


mzn     ,mvn 


10.  E-[""'":xexl|kl]k2   xl:-xl+A;   E  :=  £"""'  ["-":  xGA|kl]k2; 
max  '  max      max  ' 
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^  „,  ,  -■  Parameter  r^   j 

F  Table  ^,  Prederivative 

Change 

11.  E  =  xl  e  x2       x2  :=  x2  +  A;    t/  xl  e  A  then   E:=  true; 

/aZ-se 
endif; 

12.  E  =  xl  ^  x2       x2  :=  x2  +  A;    i/  xl  G  A  t/zen  E:=  false-, 

true 
endif ; 

13.  E  =  xl  +  x2       x2  :=  x2  +  x3;     E  :=  E  +  x3; 

tuple  xl  :=  x3  +  xl;     E  :=  x3  +  E; 

concatenation 

x2(x3)  :=  x4;      E(#xl  +  x3)  :=  x4 ; 

x2(x3:x4)  :=  x5;   E(#xl  +  x3:x4)  :=  x5 ; 

iii.   FD  Tables  for  Subsetl  Expressions  Continuous  in  Some 

of  Their  Parameters 

The  F,  Replace,  Init,  and  D  tables  shown  below  support 
the  SETL  implementation  design  discussed  in  Chapter  4  (C) . 
Because  too  many  long  and  complicated  pattern  entries  are 
required   for  these  tables,  it  is  no  longer  convenient  to 
present  them  using  the  simple  format  for  tables  given  in 
Sections  i  and  ii.   Instead,  they  are  given  separately  with 
the  following  rule  of  association:     Within  each  table, 
entries  corresponding  to  the  eleven  basic  pattern  forms 
belonging  to  F  are  numbered  1,2,..., 11. 

In  addition,  actual  pattern  names  Forml ,Form2 , . . . ,Formll 
are  used  to  specify  basic  pattern  forms  within  F. 

Expl, . . . ,Expll  are  the  corresponding  pattern  names  of  the 
Replace  table,  and  denote  macros  which  expand  to  retrieval 
terms  used  to  replace  occurrences  of  reducible  expressions. 
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Initl , . . . , Initll   are  the  respective  pattern  names 
within  Init,  and  represent  macros  which  expand  to  initiali- 
zation code.   The  D  table  entries  are  divided  into  11 
separate  groups,  each  of  which  is  associated  with  a  different 
basic  form  belonging  to  the  F  table.   These  groups  are 
presented  in  the  same  order  as  the  corresponding  entries 
of  F.   Within  each  group  associated  with   Form.  , 
i  =  1,...,11,   we  list  entries  labeled  by  letters  for  every 
continuity  parameter  x  of  Form.  ,  and  for  each  allowable 
modification  to  x.   Each  entry  contains  expressions  for  a 
parameter  change  pattern  and  a  derivative  code  macro. 

To  avoid  cluttering  our  tables  with  needless  detail, 
the  pattern  entries  displayed  for  these  tables  will  not 
adhere  strictly  to  the  pattern  language  rules.  Since  a 
modified  Subsetl  parser  can  be  made  to  recognize  the  literal 
symbols  and  tree  structure  of  pattern  expressions,  literals 
will  not  be  enclosed  within  quote  marks,  and  brackets  will 
not  be  used  to  specify  tree  structure.   Thus,  any    occur- 
rences of  brackets  will  denote  literal  symbols.   Parentheses, 
too,  will  denote  literals  instead  of  meta  symbols. 

For  conciseness  and  readability,  we  will  use  macros 
to  help  generate  pattern  e>prefasiGn:3 .   hacros  are  declared 
in  the  following  way, 

macro    name (PI ,P2 , . . . ,PN) ; 
<exp> 
<block> 

endm 
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where   name  is  the  macro  name,   P1,...,PN  are  parameter 
names,   exp  is  a  pattern  expression  and   block  is  a  sequence 
of  pattern  name  assignments.   A  macro  use,  name (textl , . . . , textN) 
causes   text.  ,   i=l,...,N,   to  be  substituted  for  each 
occurrence  of  P.  within  exp  and  block.   Note  that  a  single 
token  may  be  formed  by  concatenation  from  parts  separated 
by  the  symbol  @.   A  parameter  may  be  part  of  such  a  token. 
Although  pattern   names  defined  outside  of  macros  will  be 
considered  global  within  each  table,  all  pattern  names 
defined  within  a  macro  block  are  assumed  to  be  local  to 
each  macro  use.   After  macro  expansion,  exp  will  replace  the 
macro  invocation . 

Pattern  specifications  used  for  the  F  table  incorporated 
in  the  FD  implementation  discussed  in  Chapter  4  (C)  normally 
contain  a  procedure  name  pattern  of  the  form   Ipname 
immediately  following  each  occurrence  of  a  pattern  variable 
pvar.   Recall  that  when   Ipname  is  encountered  during 
matching,  the  procedure  pname  which   validates  the   sub- 
expression matched  by  pvar  is  executed. 

For  the  sake  of   clarity,  the  F  table  shown  below  has 
been  pruned  of  all  procedure  names.   However,  we  use  a  nam- 
ing convention  for  pattern  variables  which  allows  us  to 
restore   these  missing  procedure  names  systematically. 
We  consider  five  such  procedures  in  connection  with  an 
upgraded  F  table:   Cvar (Indset) ,  Dvar,  Svar ,  Bvar,  and  Mvar. 
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The  first  three  of  these   procedures  have  already  been 
discussed  on  p.  426.   Cvar  is  used  for  validating  pattern 
variables   which  are  continuity  parameters.   For  each 
continuity  parameter   xi   the  procedure  parameter  Indset 
is  the  name  of  the  induction  set  associated  with  xi .   The 
following  table  gives  the  rule  of  correspondence  between 
continuity  parameters  of  the  F  table  and  the  induction  sets 
defined  on  p.  456. 

Induction  Set        Continuity  Parameters 

xl ,  x2 ,  x3,  x4 ,  x5 

x6 ,  x7,  x8 ,  x9 ,  xlO,  xll,  xl2 ,  xl3 

Fl,  F2,  F3 

F8,  F9 

xl ,  x2 ,  x3,  x4 ,  x5 

Fl,  F2,  F3 

The  routine  Dvar  will  be  used  for  validating   discontinuity 
parameters.   By  convention  all  pattern  variables  of  the  F 
table  whose  name  begins  with  'q'  are  discontinuity  parameters 
and  should  be  followed  by  an  occurrence  of  the  procedure 
pattern  !Dvar. 

Svar  is  used  to  validate  special  parameters;  these  are 
pattern  variables  which  begin  with  the  letter  'k'. 

In  addition  to  the  three  procedures  already  discussed, 
we  also  require  two  new  routines,  Bvar  and  Mvar.   Bvar  vali- 
dates all  parameters  which  match  variables  bound  to   iterators. 
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^^1 

^^3 

^^4 

^^5 

^^6 

^^7 

All  such  parameters  occurring  within  the  F  table  use  the 
name  x.   To  implement   Bvar  we  must  ensure  that  whenever 
a  term  !  Bvar  is   encoxintered  just  after  an  occurrence  of 
a  pattern  variable  x,  the  expression  x  matched  by  x 
represents  a  simple  variable  occurrence. 

The  routine  Mvar  is  designed  to  validate  pattern 
variables  F4 ,  F5,  F6,  and  F7.   Mvar  must  ensure  that  any 
subexpression  matched  by  each  of  these  parameters  is  a  map 
variable . 

Various  other  procedure  names  also  prove  useful  in 
specifying   patterns  for  the  Init  and  D  tables.   Two  of 
these  procedures,  zero   and  Ct  are  used  in  connection  with 
dotted   pattern  variables  'q'.   We  associated  with  q  two 
special  counters,  #q  and  $q  whose  values  are  stored  in 
Pfunc('#q')   and  Pfunc('$q'),  where  Pfunc  is  the  pattern 
variable  map  used  in  either  macro  expansion  of  pattern  matching 
The  counter  #q  associated  with  a  particular  pattern  variable 
q.   will  have  zero  as  its  initial  value,  and  will  reflect 
the  number  of  generated  instances  of  q.  .   The  counter  $q 
will  have  the  same  value  as  #q.   except  when  it  is  reset  to 
some  other  value  constrained  by  the  condition  0  <_   Pfunc  ($q) 
<_   Pfunc  (#q)  . 

We  illustrate  the  use  of  these  counters  by  considering 
the  following  pattern  specifications, 

(1)  Paramsl  e  !zero($y)  Params2 

Params2  =    !2t($y,  #q2)  y.*,  Params2  |  y.* 
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in  connection  with  macro  expansion.   Consider  expansion 
of  Paramsl  using  a  pattern  variable  map  Pfunc   in 
which  Pfunc (#q2)  has  the  value  3.  Then  !zero($y)  will 
cause  Pfunc ($y)  to  be  set  to  zero,  and  expansion  to  proceed 
with  Params2.   The  procedure  i?.t($y,#q2)  will  succeed  when- 
ever the  value  of  $y  is  less  than  the  value  of  #q2;  other- 
wise it  will  fail. 

The  first  time  ?.t  is  called  expansion  succeeds, 
Pfunc (yl)   is  given  a  unique  generated  variable  name  nl, 
and  the  value  of  $y  is  incremented.   After  this  the  literal 
,  is  expanded,  and  Params2  is  expanded  recursively.  Once 
again  it   succeeds,  and  expansion  of  y.*  results  in  assigning 
the  value  two  to  $y,  and  a  new  name  n2  to  Pfunc (y2).  Next 
we   expand  the  comma  followed  by  Params2.   This  third  attempt 
to  call  Sit  will  also  succeed;  y.*  will  be  expanded  generat- 
ing  a  third  name  n3;  the  comma  will  be  expanded;  and  finally 
Params2  will  be  expanded.   However,  since  $y  has  the  value  3, 
It  will  fail,  and  expansion  will  proceed  with  the  right 
alternand,  y.*.   One  last  name  n4  is  generated  before  expan- 
sion terminates  with   the  following  result, 

(2 )  nl ,  n2 ,  n3 ,  n4 

representing  seven  sibling   nodes  of  a  tree.   The  sort  of 
expansion  just  illustrated  is  useful  in  generating  bound 
variables  of  forall   loops  within  the  derivative  code 
sequences  for  Forml. 
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We  use  the  procedure   eq(x,y)   to  test  the  equality 
of  the  trees   Pfunc(x)  and  Pfunc (y) .   eq  succeeds  whenever 
the  tree  comparison  Equals (Pfunc (x) ,  Pfunc (y) )  (cf.  Appendix 
E  (ii))   holds  and  fails  otherwise.   For  tree  inequality 
we  use  the  procedure    ne(x,y). 

Two  other  procedures  used  only  in  the   D  table  are 
subst (k2 ,x, u)  and  strict.   subst   always  succeeds,  and 
substitutes  all  free  occurrences  of   x  within  k2  with 
occurrences  of  u.   The  routine   strict  is  used  within  the 


D  table  in  the  following  context,  arg  ! strict,  where  arg 
can  belong  to  either  the  induction  sets  IVl  or  IV6 . 


strict  succeeds  whenever  arg  belongs  to  IV6   and  fails 
otherwise . 
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Elementary  Form  Table  (F) 


Forml  =    {x  e  xl  ' | 
k  E  Conj  &  k  I  Conj 
Conj  =  F8. (x)   =0 
F9. (x)  n-  0 
k5. 

kl.  =  ql. 
X  G  x2 
x  ^  x3 
k2  G  x4 
k3  ^  x5 
x  e  Fl  (Q2) 
k4  e  F2  (Q3) 
q4.  G  F3. (x) 
x  <  x6 
X  £  x7 
X  >  x8 
X  ^  x9 
F4 (x)  <  xlO 
F5(x)  <  xll 
F6 (x)  >  xl2 
F7(x)  >_   xl3 
Q2  =  q2. ,  Q2  |  q2 
Q3  =  q3.,  Q3  |  q3 


k} 
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macro    Setform(op) ;  {x  e  xl  'I'  Fl(x,q)  op  F2 (x) }  endm 

2.  Form2      E    Setform(G) 

3.  Forms      =    Setform(^) 

4.  Form4   =  Setform(<) 

5.  Forms  =    Setform(£) 

6.  Form6   =  Setform(>) 

7.  FormV   E  Setform(>^) 

8.  Forms   =  xl  +  x2 

9.  Form9   =  [+:  x  e  xl] 1 

10.  FormlO  =  [+:  x  G  F1{Q2)]1 

11.  Formll  E  {x  e  f10{Q5)  'I'  k} 
Q5  =  q5. ,  Q5  |  q5. 


Replace  Table 

1. 

Expl  = 

E (Params)    E 

Params 

E  Param,  Params 

Param 

Param 

E  ql.  1  q2.    q3. 

q4. 

2. 

Exp2  E 

E(q) 

3. 

Exp3 

=  E(q) 

4. 

Exp4 

=  E(q) 

5. 

Exp5 

=  E(q) 

6. 

Exp6 

=  E(q) 

7. 

Exp  7 

=  E(q) 

8. 

Exp8 

E  E 

9. 

Exp9 

E  E 

10. 

ExplO 

=  E(Q2) 

Q2  E  c 

l2.,  32    q2. 

11. 

Expll 

E  E(Mparams) 

Mparams  =  Mparam,  Mparams  |  Mparam 
Mparam  =  q5.  |  ql.  1  q2.  |  q3.  |  q4 


Init  Table 

1.    Initl  =    Srts 

E*  :=  nullset 

(Vx*  e  xl,  Iteradd  ' | '  k) 

Eitiapl  :=  Emapl  +  {xl  ort    {x}; 
end    V  ; 
Emapl  =  Izero ($kl,$w3)  E(Params) 
Params  =  Param,  Params  |  Param 
Param   e  kl.  |  !  wl  |  !  w2  |  w3. 
Iteradd  =  Iter,  Iteradd  |  Iter 

Iter  =  !  wl*  6  {u*  e  Pro ject (#q2 ,F1)  '|'  x  e  Fl (u) } 
!  w2*  e  {u*  e  Project  (#q3,F2)  '|'  k4  e  F2 (u)  } 
w3.*  e  F3. (x) 
k  E  Conj  &  k  I  Conj 


Con  j  = 


X  £  x2 

X  ^  x3 

k2  G  x4 

k3  ^  x5 

X  <  x6 

X  _<  x7 

X  >  x8 

x  >_  x9 

F4 (x)  <  xlO 

F5(x)  <_   xll 

F6 (x)  >  xl2 

F7  (x)  >  xl3 

F8. (x)  =  0 


553 


F9. (x)  1=    0    I 
k5. 
Srts  =  Srt  Srts  |  Srt 
macro    Sort (set , op, id) ; 

!   Sortas(set,  pred@id(au*  ,  succ@id@u)  ; 

xmin@id@u  :=  [min:    w*  s  arg  '  |  '  w  >_  x(3id]w; 
pred@id@u (Q)  :=  [max:    w  G  arg]w; 


endm ; 
macro 
macro 
Srt  E 


Sortr (op, id)  ;  Sort  (xl ,op , id)  endm 
Sortf (op, id,num) ;  Sort (Fgnura (xl) ,op, id)  endm 
Sortr  (>^,  6)        | 
Sortr(>,7)        | 
Sortr(>,8)        \ 
Sortr (^,9)         | 
Sortf (>,10,4)      I 
Sortf(>,ll,5)      I 
Sortf (>,12,6)      I 
Sortf (^,13,7) 
Init2  E  E*  :-    nullset; 

(Vx*  G  Dam    Fl ,  y*  e  Dom    Fl{x}  '|' 

x  G   xl  &  Fl  (x,y)  e  F2 (x)  ) 
E(y)  :=  E(y)  +  {x}  ort    {x}; 
end    V; 
InitS  E  E*  :=  nullset; 

Cix*    G  Dom    Fl,  y*  e  Dom    Fl{x}  '|' 

X  e  xl  &  Fl(x,y)  ^  F2 (x) ) 
E(y)  :=  E(y)  +  {x}  ort    {x}; 
end    V  ; 
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macro    Initform (opl ,op2 ) ; 
E*  :-    nullset ', 
(Vx*  e  Dom   Fl,  y*  e  Dom    Fl{x}  '  |'  Fl(x,y)  opl  F2 (x) ) 

E(y)  :-  E(y)  +  {x}  ort    {x}  ; 
end    V; 
xminSu*    :=  nullset', 

(Vx  G  xl) 

Sortas  (Z)om  Fl{x},  pred@u,  succgu,  x)  ; 
pred(au(x,fi)  :=  [maxi    y  e  Z}om  Fl{x}]y; 
xmin@u(x)  :=  [min:    y  e  Dom    Fl{x}  '|'  y  op2  F2(x)]y; 
end    V; 
en  dm 

4.  Init4   =  Initform  (<  ,>^) 

5.  InitS   =  Initform(^,>) 

6.  Init6   s  lnitform(>,>) 

7.  InitV   s  Initform(^,^) 

8.  InitS   =  E*  :=  xl  +  x2 ; 

9.  Init9   =  E*  :=  [+:  x*  G  xl]l; 

10.  Straightforward  Initialization 
a.  Initio  =  E*  :=  nullset  ; 

(V[Paramsl]  g  Pro ject (#q2 ,F1 ) ) 

E(Params3)  :=  [  +  :  x*  G  Fl  (Params3)  ]  1  ; 

end    V ; 
Paramsl   =  !zero($y)  Params2 
Params2   =  !Lt($y,#q2)  y.*,  Params2  |  y.* 
Params3   s  !zero($y)  Params4 
Params4   =  y.,  Parains4  |  y. 
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Differential  Initialization 

b .  Parameter  change : 
Fl  :=  null  set ; 
Prederivative : 

E*  :=  nullset ; 

c.  Parameter  change: 

Fl(Params5)  :=  Fl(Params5)  +  A  ort    A; 
ParamsS  =  !zero($y)  Params6 
Params6  =  !Lt($y,#q2)  y.,  Params6  |  y. 
Prederivative : 

E{Params5)  :=  E(Params5)  +  1  ort    1; 
11.   Initll  =  E*  :=  nullset; 

(V[Params7]  e  Pro ject (#ql+#q2+#q3+#q4+#q5 ,F1) ) 

E(Params3)  :=  {x*  6  Fl(Params3)  '|'  k}; 
end    V; 
Params7  =!zero($y)  ParamsS 
ParamsS  =  ! Lt ($y , #ql+#q2+#q3+#q4+#q5) 
y.*,  ParamsS  |  y.* 


556 


Derivative  Table  (D) 

macro    Itin  (arg) ; 

A  -  arg  Istrict  |  A 
endm 
macro    Itout(arg); 

A  *  arg  ! strict  |  A 
endm 

macro    Iteradd (skip,num) ; 
Iterad 

Iterad  =  Itera,   Iterad  |  Itera 
Itera  =    !ne(skip,Fl)  !  [P6]  G 

{ [P5]  e  Project (#q2,Fl)  ' | '  x  e  Fl(u) }      | 
!ne  (skip,F2)  i  [P7]  e 

{[P5]  e  Project(#q3,F2)  'I'  k4  e  F2 (u) }     | 
!£t($F3,nuin-l)  w3.*  e  F3.(x)  | 

!eq(skip,F3)  !eq  ($F3  ,nuin-l )  w3.*  G  Itin(F3.(x))  1 
!£t  ($F3,#F3)  w3.*  S  F3.  (x) 
endm 

macro    Itersub (skip,num) ; 
Itersu 

Itersu  =  Iters,  Itersu  |  Iters 
Iters  =    !ne(skip,Fl)  i  [P6]  G 

{ [P5]  G  Project (#q2,Fl)  ' | '  x  G  Fl(u) }  *  Dom    E 


!ne(skip,F2)  !  [P7] 

{[P5]  G  Project  (#q3,F2)  'I'  kGF2(u)}*  Dom    E{wl}l 
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Idt (#F3, num-1)  w3.*  6  F3.(x)  *  Dom    E{Paramsl} 
!eq(skip,F3)  ! eq ($F3 ,num-l)  w3 . *eitout (F3 . (x) ) * 

Dom    E{Paramsl} 
!£t($F3,#F3)  w3.  G  F3.(x)  *  Dom   E{Paramsl} 
endm 

Paramsl  =  ! zero {$w 3 , $wl , $w2)  Params2 
Params2  e  Param2 ,  Parains2  |  Param2 
Param2   e  w]  .  | 
w2  . 1 
w3 . 
Params  =    ! zero ($K1 ,  $w3 , $wl , $w2 )  Parameters 
Parameters    Param,  Parameters  [  Param 
Parain  e  kl.  |   wl  .|    w2  .|  w3. 
P5  =    !zero($u)  P8 
P8  =  u.*,  P8  I  u.* 
P6  =  !zero($wl)  P9 
P9  =  wl.*,  P9  I  wl.* 
P7  E  !zero($w2)  PIO 
PIO  =   w2.*,  PIO  I  w2.* 
macro    k (skip, num) ; 
bool 

bool  =    Conj  &  bool  |  Conj 

Conj  E  !ne(skip,x2)  !  x  e  x2   | 

!ne(skip,x3)  !  x  ^  x3   | 

!ne (skip,x4)  !  x2   e    x4  | 

!ne(skip,x5)  !  k3  s  x5  | 
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!ne(skip,x6)  !  x   <  x6 
Ine  (skip,x7)  !  x   <  x7 
!ne{skip,x8)  !  x   >  x8 
Ine (skip,x9)  !  x  ^  x9 
!ne(skip,xlO)  !  F4 (x)  >  xlO 
!ne(skip,xll)  !  F5 (x)  <  xll 
!ne(skip,xl2)  !  F6 (x)  >  xl2 
!ne{skip,xl3)  !  F7  (x)  >^  xl3 
! £t ($F8,num-l)  F8 . (x)  =  0 
!eq(skip,F8)  ! eq ($F8 ,num-l)  !Inc($F8) 
Ut($F8,#F8)  F8.  (x)  =  0 
Ut ($F9,num-l)  F9.(x)  1=  0 
!eq(skip,F9)  ! eq ($F9 ,num-l )  !Inc($F9) 
l£t  ($F9,#F9)  F9.  (x)  1=  0 
k5. 


endm 
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la.   Parameter  change: 
xl  :=  xl  +  A; 
Prederivative : 
(Vx*  e  Itin(xl),  Iteradd  ' | '  k) 

E(Params)  :=  E(Params)  +  (xl; 
end    V  ; 

b.  Parameter  change: 
xl  :=  xl  -  A; 
Prederivative : 

(Vx*  G  Itout(xl),  Itersub  ' | '  k) 

E(Params  :=  E(Params)  -  {x}; 
end    V ; 

c.  Parameter  change: 
x2  :=  x2  +  A; 
Prederivative : 

(Vx*  e  Itin(x2),  Iteradd  '1'  k(x2)  &  xG  xl) 

E(Params)  :=  E(Params)  +  (x}; 
end    V ; 

d.  Parameter  change: 
x2  :=  x2  -  A; 
Prederivative : 

(Vx*  G  Itout(x2),  Itersub  '|'  k(x2)  &  x  ^  xl) 

E(Params)  :=  E(Params)  -  {x}; 
end   V ; 
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e.  Parameter  change: 
x3  :=  x3  +  A; 
Prederivative : 

(Vx*Gltin(x3)  ,  Itersub  '|'  k(x3)  &  x  e  xl) 

E(Params)  :=  E(Params)  -  {x}; 
end    V  ; 

f.  Parameter  change: 
X  3  :  =  X  3  -  A  ; 
Prederivative : 

(Vx*eitout (x3)  ,  Iteradd  '  |  '  k (x3  )  &  x  s  xl ) 

E(Params)  :=  E(Params)  +  {x}; 
e  n  6?  V  ; 

g.  Parameter  change: 

x4  :=x4  -A; 

Prederivative : 

(Vy*  e  Itin(x4),  x*  e  {u*  G  xl  'I'   ! subst(k2 ,x, u)  =  y}, 

Iteradd  ' | '  k(x4) ) 

E(Params)  :=  E(Params)  +  {x}; 
end    V  ; 
h.   Parameter  change: 
x4  :=  x4  -  A; 
Prederivative : 
(Vy*  e  rtout(y^),  x*  e  {u*  g  xl  '|'  ! subst (k2 ,x, u) -  y], 

Itersiob  '  |  '  k(x4)  ) 

E(Params)  :-    E(Params)  -  {x}; 
end    V  ; 
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Parameter  change: 
x5  :=  x5  +  A; 
Prederivative : 

(Vy*  G  Itin(x5),  x*  e  {u*  e  xl  '|'  ! subst (k3 , x, u)  =  y}, 
Itersub  '  |  '  k (x5)  ) 
E(Params)  :=  E(Params)  -  {x} ; 
end    V  ; 

Parameter  change: 
x5  :=  x5  -  A; 
Prederivative : 

(Vy*  e  ltout(x5),  x*  e  {u*  e  xl  '|'  ! subst (k3, x, u)  =  y}, 
Iteradd  ' | '  k(x5) ) 
E(Params)  :=  E(Params)  +  {x} ; 
end    V  ; 

Parameter  change : 
F3@$F3(x)  :=  F3@$F3(x)  +  A; 
Prederivative : 
t/  X  s  xl  then 

(Vlteradd(F3,$F3)  ' | '  k) 

E(Params)  :=  E(Params)  +  {x}; 
end    V  ; 
endif ', 

Parameter  change: 
F3(a$F3(x)  :=  F3@$F3(x)  -  A  ; 
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Prederivative : 
-£/  X  G  xl  then 

(Vltersub(F3,$F3)  ' | '  k) 

E(Params)  :=  E{Params)  -  {x}; 

end    V  ; 

endif; 

m.   Parameter  change: 

Fl(ParamslO)  :=  Fl (ParamslO )  +  A  ; 

ParamslO  =  lzero($wl)  Paramsll 

Paramsll  e  !Lt($wl,#q2)  wl .  Paramsll  |  wl . 

Prederivative : 

(Vx*  G  Itin (Fl (ParamslO) ) ,  Iteradd(Fl)  ' | '  x  e  xl  &  k) 

E(Params)  :=  E(Params)  +  {x}; 

end    V  ; 

n.   Parameter  change: 

Fl (ParamslO)  :=  Fl (ParamslO)  -  A; 

Prederivative : 

(Vx*  e  itout (Fl (ParamslO) ) ,  Itersub(Fl)  'I'  x  e  xl  &  k) 

E(Params)  :=  E(Params)  -  {x}; 

end    V; 

o.   Parameter  change: 

F2(Paramsl2)  :=  F2(Paramsl2)  +  A; 

Paramsl2'E  !zero($w2)  Paramsl3 

ParamslS  =  !Lt($w2,#q3)  w2 .  Paramsl3  |  w2 . 

Prederivative : 

( Vy*  G  Itin (F2 (Paramsl2) ) ,  x*  e 

{u*  G  xl  'I'  Isubst (k4 ,x,u)  =  y}. 


Iteradd(F2)  '  |  '  k) 

E(Params)  :=  E(Params)  +  {x}; 

end   V; 
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Parameter  change : 

F2(Paramsl2)  :=  F2 (Paramsl2 )  -  A ; 
Prederivative : 
(Vy*  €  Itout (F2 (Paramsl2) ) ,  x*  e 

{u*  e  xl  'I'  subst (k4 ,x, u)  =  y}, 


Itersub(F2)  ' | '  k) 

E(Params)  :=  E(Params)  -  {x}; 

end    V ; 

macro   Dsuccr (relop, setop , id) ; 

{while    xmin@id@u  relop  x@id  +  A) 

(Vx*  :-    xmin(3id@u. 

Iterator  ' | '  k(x@id) ) 

E(Params)  :=  E(Params)  setop  {x}; 

end    V  ; 

xrain@id@u    :=    succ(3id@u  (xinin@id@u)  ; 

endwhile  ; 

Iterator  =  !eq  (setop, +  )  Iteradd  |  Itersub 

endm 

macro    Dpredr (relop , setop , id) ; 

(while    pred@id@u  (xinin@id@u)  relop  xQid  -  A) 

xmin@id@u  :=  pred@id@u (xmin@id@u) ; 

(Vx*  :=  xmin(aid@u. 

Iterator  '  |  '  k(x@id)  ) 

E(Params)  •.-   E(Params)  setop  {x}; 

end    V ; 

endwhi le ; 

Iterator  E  !eq (setop, +)  Iteradd  |  Itersub 

endm 
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maaro    Dsuccf (relop, setop, id ,arg) ; 
{while   xinin@id@u  relop  x@id  +  A) 

(Vx*  s  {u*  s  xl  ' I '  arg(u)  =  xmin@id@u}. 
Iterator  '  |  '  k (x@id)  ) 
E(Params)  :-   E(Params)  setop  {x}; 
end    V; 

xmin@id@u  :=  succOidgu {xmin@id@u) ; 
endwhi le ; 

Iterator  s  !eq  (setop, +  )  Iteradd  |  Itarsub 
endm 
macro   Dpredf (relop, setop, id,  arg) ; 

{while    pred@id(au  (xmin@id@u)  relop  x@id  -  A) 
xmin@id@u  :=  pred@id@u (xmin@id@u) ; 
(Vx*  £  {u*  G  xl  ' I '  arg(u)  =  xmin@id@u} , 
Iterator  '  |  '  k(x@id)  ) 
E(Pararas)  :=  E(Params)  setop  {x}; 
end    V ; 
endwhile  ; 

Iterator  =  !eq (setop, +)  Iteradd  |  Itersub 
endm 

q.   Parameter  change: 
x6  :=  x6  +  A; 
Prederivative : 
Dsuccr (< , +, 6 ) 
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r.   Parameter  change: 
x6  :=  x6  -  A; 
Prederivative : 
Dpredr (^,- ,6) 
s.   Parameter  change: 
x7  :=  x7  +  A; 
Prederivative : 
Dsuccr (£,+,7) 
t.   Parameter  change: 
x7  :=  x7  -  A; 
Prederivative : 
Dpredr (>,-,7) 
u.   Parameter  change: 
x8  :=  x8  +  A; 
Prederivative : 
Dsuccr {±r- I  8) 
V.   Parameter  change: 
x8  :=  x8  -  A; 
Prederivative : 
Dpredr (>,+, 8) 
w.   Parameter  change: 
x9  :=  x9  +  A; 
Prederivative : 
Dsuccr (< ,- ,9) 
X.   Parameter  change: 
x9  :=  x9  -  A; 
Prederivative : 
Dpredr (^,+,9)        555 


y.   Parameter  change: 
xlO  :=  xlO  +  A; 
Prederivative : 
Dsuccf (<,+,10,F4) 

z .   Parameter  change : 
xlO  :=  xlO  -  A; 
Prederivative : 
Dpredf (^,-,10,F4) 

aa.  Parameter  change: 
xll  :=  xll  +  A; 
Prederivative : 
Dsuccf (£,+,11, F5) 

bb .  Parameter  change: 
xll  :=  xll  -  A; 
Prederivative : 
Dpredf (>,-,ll,F5) 

cc .  Parameter  change : 
xl2  :=  xl2  +  A; 
Prederivative : 
Dsuccf (<,-, 12, F6) 

dd.  Parameter  change: 
xl2  :=  xl2  -  A; 
Prederivative : 
Dpredf (>,+, 12, F6) 

ee .  Parameter  change: 
xl3  :=  xl3  +  A; 
Prederivative : 
Dsuccf {<,+, 13, F7) 
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ff.  Parameter  change: 
xl3  -  A; 
Prederivative : 
Dpredf (>,-,13,F7) 
gg.  Parameter  change: 

F8(a$F8(x)  :=  F8(a$F8  (x)  +  A  ; 
Prederivative : 

-if  X  e  xl  &  F8(a$F8{x)  =  0  then 
(Vltersub  ' | '  k(F8,$F8) ) 

E(Params)  :=  E(Params)  -  {x}; 
end    V  ; 
endif ; 
hh.  Parameter  change: 

F8@$F8(x)  :=  F8(a$F8(x)  -  A; 
Prederivative : 

if  X  G  xl  &  F8@$F8(x)  =  A  then 
(Vlteradd  ' | '  k(F8,$F8) ) 

E(Params)  :=  E(Params)  +  {x}; 
end    V  ; 
endif ', 
ii.  Parameter  change: 

F9(a$F9(x)  :=  F9(a$F9  (x)  +  A; 
Prederivative : 

if  X  e  xl  &  F9@$F9(x)  =  0  then 
(Vlteradd  ' | '  k(F9,$F9) ) 

E(Params)  :=  E (Params  +  {x}; 

end    V ; 
endif; 
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jj.  Parameter  change: 

F9(a$F9(x)  :=  F9(a$F9(x)  -  A; 
Prederivative : 

if   X  G  xl  &  F9(a$F9(x)  -  A  then 
(Vltersub  '  |  '  k(F9,$F9)  ) 

E(Params)  :=  E{Pararas)  -  {x}; 
end    V  ; 
endif; 
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2a.   Parameter  change: 
F2  (y)  :=F2 (y)  +  A; 
Prederivative : 
i/  y  e  xl  then 

(Vx*  G  Itin(F2  (y) )  ,u*  e  {w*  e  Dom  Fl{y}  '|'  Fl(y,w)  =  x}) 

E(u)  :=  E{u)  +  {x}; 
end    V ; 
end  if; 
b.   Parameter  change: 

F2 (y)  :=  F2  (y)  -  A; 
Prederivative : 
•£/  y  e  xl  then 

(Vx*  e  Itout  (F2  (y)  )  ,  u*  G  {w*  €  Dom  Fl{y}  'I'  Fl(y,w)=x}) 

E(u)  :=  E(u)  -  {x}; 
end    V  ; 
endif', 
3a.   Parameter  change: 

F2(y)  :=  F2  (y)  +  A ; 
Prederivative : 
i/  y  s  xl  then 

(Vx*  G  Itin(F2(y)),  u*  G  {w*  G  Dom    Fl{y}  '|'  Fl(y,w)=x}) 

E(u)  :=  E(u)  -  {x}; 
end    V  ; 
endif; 
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b.   Parameter  change: 

F2(y)  :=  F2  (y)  -  A; 
Prederivative : 
if   y  6  xl  then 

(Vx*  G  Itout  (F2  (y)  )  ,  u*  e  {w*  e  Dom    Fl{y}  '|'  Fl(y,w)==x}) 

E(u)  :=  E(u)  +  {x}; 
end    V  ; 
endif', 

macro    xsucc (relop, setop) ; 

{while    xmin@u(y)  relop  F2 (y)  +  A) 

(Vx*  e  {w*  G  Dom   Fl{y}  '|'  Fl{y,w)  =  xmin@u(y)}) 


E(x)  :=  E(x)  setop  {y}; 
end    V  ; 

xmin@u(y)  :=  succ@u (y , xmin@u (y ) ) ; 
endwhile ; 
endm 

macro    xpred (relop, setop) ; 

{while    pred@u  (y ,  xinin@u  (y )  )  >_  F2  (y)  -  A) 
xinin@u(y)  :=  predSu  (y  ,  xmin(au  (y )  )  ; 
(Vx*  G  {w*  G  Dom   Fl{y}  '|'  Fl(y,w)  =  xmin(au(y)} 

E(x)  :=  E(x)  setop  {y}; 
end    V; 
endwhi le ; 
endm 
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4a.   Parameter  change: 

F2(y)  :=  F2  (y)  +  A; 
Prederivat_ve : 
xsucc (< ,+) 
b.   Parameter  change: 

F2(y)  :=  F2  (y)  -  A; 
Prederivative : 
xpred  i^,-) 
5a.   Parameter  change: 

F2  (y)  :=  F2(y)  +  A; 
Prederivative : 
xsucc (£, +  ) 
b.   Parameter  change: 

F2 (y)  :=  F2 (y)  -  A; 
Prederivative : 
xpred (> , -) 
6a.   Parameter  change: 

f2(y)  :=  F2(y)  +  A; 
Prederivative : 
xsucc (^f-) 
b .   Parameter  change : 

F2 (y)  :=  F2 (y)  -  A; 

s 

Prederivative : 
xpred(> ,+) 
7a.   Parameter  change: 

F2  (y)  :=  F2  (y)  +  A ; 
Prederivative : 
xsucc (< ,-) 
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b.   Parameter  change: 

F2(y)  :=  F2  (y)  +  A ; 
Prederivative : 
xpred(^,+) 
8a.   Parameter  change: 
xl  :=  xl  +  A; 
Prederivative : 
E  :=  E  +  A; 

b.  Parameter  change: 
xl  :=  xl  -  A; 
Prederivative : 

E  :  =  E  -  (A  -  x2 )  ; 

c.  Parameter  change: 
x2  :=  x2  +  A; 
Prederivative : 

E  :=  E  +  A; 

d.  Parameter  change: 
x2  :=  x2  -  A; 
Prederivative : 

E  :=  E  -  (A  -  xl) ; 

9a.   Parameter  change: 

xl  :=  xl  +  A; 

Prederivative : 

E  :=  E  +  [+:  x  G  itin(xl)]l; 

b.   Parameter  change: 

xl  :=  xl  -  A; 

Prederivative : 

E  :=  E  -  [+:  X  G  itout(xl)]l; 
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10a.  Parameter  change: 

Fl(Params3)  :=  Fl(Params3)  +  A; 
ParamsS  ^  !zero($y)  Params4 
Params4  =    !Lt($y,#q2)  y.,  Params4  |  y. 
Prederivative : 

E(Params3)  :=  E(Params3)  +  [+:  x*  G  Init (Fl (Params3) ) ] 1; 
b.   ParaiTieter  change: 

Fl(Params3)  :=  Fl(Params3)  -  A; 

Prederivative : 

E(Params3)  :=  E(Params3)-  [+:  x*  6  Inout (Fl (Params3) ) ] 1 ; 
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APPENDIX  D 
VARIOUS  ELEMENTARY  AND  COMPOUND  SET  THEORETIC  TRANSFORMATIONS 

In  this  section  we  give  a  list  of  auxiliary  set 
theoretic  transformations  likely  to  be  useful  supplements 
to  formal  differentiation  around  which  our  proposed  imple- 
mentation  will  be  built.  This  list  includes  transformations 
likely  to  aid  in  performing  preparatory  and  cleanup  tasks 
arising  before  and  after  formal  differentiation.  Although 
most  of  these  rules  lie  at  a  relatively  low  level,  at  the 
end  of  the  present  appendix  we  describe  a  way  to  collect 
simple  rules  into  'rule  groups'  applicable  automatically 
and  collectively  over  a  region. 

Each  transformation  we  consider  will  be  written  as  a 
rewrite  rule  in  one  of  two  forms.   We  write  LHS  =>  RHS  to 
indicate  that  the  LHS  pattern  can  be  replaced  by  RHS; 
the  second  form  LHS  **  RHS   designates  a  production 
allowing  for  replacement  of  either  RHS  by  LHS  or  of  LHS 
by  RHS.   The  notation  <k,x  \ y>   indicates  that  all  occur- 
rences of  the  term  x  in  k  are  to  be  replaced  by  y. 

Unless  otherwise  specified  we  assume  that  all  trans- 
formable expressions  are  applicative,  and   consequently 
side  effect  free.   For  this  reason,  in  theory  the  usual 
transformations  (e.g.,  commutative  laws)  which  can 
rearrange  the  order  of  expression  of  computations  can  be 
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expected  to  hold.   Since  in  SETL,   primitive  operations 
consider  boolean  arguments  to  be  false      if  not  explicitly 
true,    we  expect  that  standard  identities  such  as 

3x  G  s  I  f(x)  >  g(x)  *»  n  Vx  e  s|n  (f(x)  >  g(x)) 

are  preserved.   Due  to  finite  computer  data  representation 
and  machine  dependent  error  condition  handling,  issues  of 
code  motion  safety   frequently   pose  obstacles  to 
nontrivial  arithmetic  transformations  (e.g.  distributive 
laws  may  not  hold) ,  but  this  does  not  concern  us  since 
our  primary  interest  is  in  expressions  involving  finite 
sets  and  set  theoretic  relations.   However,  even  when 
arithmetic  operations  appear,  the  order  of  evaluation  may 
sometimes  be  changed  without  sacrificing  accuracy,  (cf.  wi 
for  a  further  discussion  of  safety  problems  in  formal 
differentiation) . 

Some  of  the  transformations  listed  here  will  rearrange 
the  execution  order  of  code  C   only  when  the  Usetodef  and 
Deftouse  maps  do  not  change,  a  precondition  which  we  call 
'transformational  dis jointness ' . 

i.    Simple  Set  Identities  (all  arguments  are  set  valued) 

C0MI4UTATIVE  Laws  ASSOCIATIVE  Laws 

CI.   S*T<*T*S         Al.   S*(T*Q)*>(S*T)*Q 
C2.   S  +  T  «•  T  +  S         A2.   S  +  (T  +  Q)  «  (S  +  T)  +  Q 
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Idempotent 
II.   S  *  S  <*  S 
A2.   S  +  S  *»  S 


N3. 
N4  . 


Ml. 
M2. 
M3. 
M4. 
M5. 
M6. 
M7. 
M8. 
M9. 
MIO 


Rules  for  Nullset 


Neutral  Element 
Nl.  S  +  nullset  **  S 
N2.   S  -  nullset   **  S 

DISTRIBUTIVE 


S  *  nullset   **  nullset    Dl .  S  *  (T  +  Q)  **  (S  *  T)  +  (S*Q) 

S  -  S        <*    nullset    D2  .  S  +  (T  *  Q)  *»  (S  +  T)  *  (S+Q) 

D3.  S  *  (T  -  Q)  <*  (S  *  T)  -  (S*Q) 
D4.  (S  +  T)  -    Q   <^    (S  -  Q)  +  (T-Q) 

MISCELLANEOUS  Rules 

s  -  (t  +  Q)  ^  (s  -  t)  -  Q  **  (s  -  Q)  -  t 

s  +  (t  -  Q)  «*  (s  +  t)  -  (Q  -  s) 

s  -  (t  -  Q)  «  (s  -  t)  +  (s  *  Q) 

s  -  (t  *  Q)  *>  (s  -  t)  +  (s  -  Q) 

s  *  (t  -  Q)  «  (s  -  Q)  *  t 

s-t*>s-  (t*s)<*(s-t)-t 

s  «  (s  +  t)  -  (t  -  s)  **  (s  -  t)  +  (s  *  t) 

s  +  t**s+  (t-s) 

s*t**s-  (s-t) 

(S  -  T)  *  T  «>  nullset 


Rules  which  Require  T  Il^CS   S  as  an  Enabling  Condition 

El.   S  -  T  *>  nullset  E3.   S  +  T  <*  T 

E2.   S  *  T  <*  S  E4.   S  n=  T  «•  T  -  S  n=  nullset 

Tautologies 

Tl.       T    IISICS    nullset  T3.       T    INCS    T-S 

T2.       T    INCS    T    *    S  T4.       T    +    S    INCS    S 
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ii .   Boolean  and  Relational  Identities 


ASSOCIATIVE 
Al.   K  &  (J  &  L)  *»  (K  &  J)  &  L 
A2.   K  or     (J  or    L)  <*  (K  or    J)  or    L 

Neutral  Element 
Nl.   K  &  true    *»  K 
N2.   K  or    false    -*  K 

DISTRIBUTIVE 
Dl.   K  &  (J  or  L)  *>  (K  &  J)  or     (K  &  L) 
D2.   K  or  (J  &  L)  *»  (K  or  J)  &  (K  or    L) 


COMMUTATIVE 
CI.   K&J**J&K 
C2,   K  or  J  ^  J  or  K 

Idempotent 

11,  K  &  K  **  K 

12 .  K  or  K  «  K 

Rules  for  TRUE,  False 
N3.   K  &  False    ^  False 
N4.   K  or   True    **  True 

NEGATION 
N5.   n  n  K  **  K 
N6.   n  False    *>  True 

DE  MORGAN ' S  LAWS 

Ml.  K  or  J  ^  n  (n  K  &  n  J) 

M2,   K  S,  J  <»  n  (H  K  or  n  J) 

RELATIONAL  and  Negation    Relational 

Rl.   n(K   n=  J)  <>  (K  =  J)    R5.  I  <  M  «  M  >  I 

R2.   n(K  ^  S)  ■=>  K  e  S        R6,  I  <^  M  <*  M  ^  I 

R3.   n(I  >  M)  <*  I  <  m]        R7.  I  =  M  <>  M  =  I 

R4.  n(i  ^  m)  **  I  <  mJ     r8.  I  n=  m  «  m  1:3  I 

^    I  and  M  must  be  inteyers 
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MORE    COMPLICATED    RULES 


K    =    J    <*    K    e     {J} 


[or:    X,    e    s       ,...,    X      e    s      K    ]     [or:    y   et    , . . .y   et      k    ] k 
1  1  n  nl  11  m      m      z      3 


[or:    X-,    G    s,     ,...,    X      ^    s    ,    y^e    t,,...,y   G    t     Ik,     &    k„]k-, 
1  In  nil  m        m'    1  23 


[&     :    Xt    e    s,     ,...,    X      S    s      k,][&:    Y-,^t^,...,Y    Gt      k_]k^ 
1  1  n  n'    1  ^1      1  -^m      m '     2      3 


f^    =    ^l^^l ^n^^n'    ^l^^l ^m^^ml'^1    ^    ^2]^3 


DE    MORGAN ' S    LAWS 


[&:    Xt    g   s,     ,...,    X     e   s    |k,]k„ 
1  1  n  n '     1       2 


~\[or:    x,G    s,     ,...,    X      e    s     |k,]~lk_ 
1         1  n  n '     1         2 


[or:    X,    G    s,  ,  .  .  .  ,    X      e    s     |k,]k„ 
1  1  n  n '    1      2 


-,[&    :    X,    e    s,,...,    X      e    s     |k,]nk^ 
\  1  1  n  n '     1         2 


DISTRIBUTIVE    LAVJS 

& 
[or:    X,    G    s,     ,...,    X      e    s     |k,](kT  k-,) 

1  1  n  n'12oi'3 


[or:    x-^   e    s^    ,...,    x^   e    s^|k^]k2    ^^    [or:    ^i^s-|_ ,  .  .  .x^Gs^  |  k^]  k^ 
f^    =    ^1    ^    ^    ^n   ^    ^nl^l^^'^2    or    ^3^ 


f^    =    ^1   ^    ^1    ^n  ^    ^nl^l^^2   or    ^^    =    ^l^^l ^n^^n'^J'^S 
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SIMPLIFICATION 


if   X, ,...,x  are  not  free  in  k„  and 

{<x,,...,x  >,  x,Gs,  ,...,  X  Ss  Ik,}  ~\-   nullset 
1      n    i   1        n   n   i 


^7'-    ^1  ^  ^1  ^n  ^  ^nl^J^2 


iV"  ••    x,e  s   ,...,  X   G  s   I  k   &  k„]  True 
&     1    1         n     n     i     z 

iii.  Set  Former  Manipulation  and  Simplification  Rules 

DISTRIBUTIVE  LAWS 

{e:    x-GSt     ,...,    X  Gs     Ik,    or    k„} 
11  n      n '     1  2 

{e:    X  Gs       ,...,    X  Gs    Ik,}    +    {e:    x,GSt     ,...,    x  Gs     Ik-} 
11  nn'l  11  nn'2 

{e:    x,Gs,     ,...,    X   Gs     Ik,     &    k„} 
11  nn'l  2 

0 
{e:    x^Gs^    ,...,    x^Gs^lk^}    *{e:    x-j^G-^    ,...,    x^Gs^|k2} 

{e:    x^Gs^    ,...,    x^Gs^|[or:    y^Gt-^    ,...,    Yj^et^^Jk} 

*    transformational  disjointness 

required 

f^=  ^l^^l  ^m^^m^^^r  ^l^^l  ^n^^'^nl'^^ 

{e:  x^Gs^  ,...,  x^Gs^|[&:  Y^^t^  ,  .  .  .  ,y^Gtj^]  K} 

0    transformational  disjointness 

required 

f*=  ^l^h  ^m^t^l^^=  ^l^^l  V^n^ 
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[+:  x,SSt  ,...,  X  Ss  |K]{e} 
11        n   n ' 


{e:  x^Gs,  , . . . ,  X  Gs  I K} 
11         n   n ' 


A  :=  exp{{e:    x,&s^     »...,  x  €s  Ik}) 
^  11        n   n ' 

0  A  must  not  be  free  in  RHS  of  assignment; 

A  :=  nullset ;  Transformational  disjointness  is 

(Vx  Gs,  ,...,  X  Gs  Ik)   required;  exp    is  an  expression 
linn'       ^      •         f  jr 

A  :=  A  +  ex-p{{e]);    in  which 

EtlD    V;  ea;p({e,,e  })  =  exp  l,{e^])+exp  {{e^]) 

holds . 

{on    x^Gs,  ,  .  .  .  ,  x^Gs^|K]  (J  %   T) 
11        n   n '      5c 


J  ^  [+:  x,es,  ,...,  X  Gs  |K]T 
^  '     11'    '   n   n ' 


where   x, ,...,x    are  not  free  in  J. 
1      n 

Simplification  Rules  (basic  to  all  iterative  operations) 

si.   {e:  XnGs,  ,...,  X  G{p}   . . ,  x  Gs  Ik} 
11        J  n   n 

{<e,Xj\p>:  x^Gs^  ^j-l^^j-1'  ^J+1^<^J+1'^J  ^P' 


c  G  <s  ,x^  \p>  I  <k,x_  \p>} 
n     n   J      '      J 


s2.   {e:  X  Gs,  ,..,  x  Gs^  ,...,  x  Gs  Ix^  g  Q  &  K} 

I  1       J   J        n   n '  J 

*  when  s^  INCS   Q   e.g.,  Q  =   (p) 

u 

{e:  X^Gs,   ,...,  X^GQ  ,...,  X  Gs  Ik} 

II  J    '       n   n ' 

where  Q  and  s   do  not  have  free  occurrences   of  ^j+j'-'-'^n' 
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:3  .   {e:  Xj^^s^  , ,  Xj^Sj  , ,  ^^63^  ,...,  x^es^jk} 


{e:  Xi^s,  ,...,  ^j^s   ,...,  XjSSj  ,...,  x^es^|k} 


where     s^   has  no  free  occurrences  of   x^  ,  .  .  .  ,  ^j-i    ' 
s   has  no  free  occurrences  of  x   ,,..., x      , 
and  for   L  =  J+1,...,I-1,  s^.  does  not  have  free 

J-i 

occurrences    of    x      or    x       . 


s4.       {e:    Xt^St     ,.-./    x^nullset    ,  .  •  .  /    x^s     |  k} 
J.      X  u  nil 

nullset 


iv.   Forall   LOOP  LAWS 


(VxGs,  ,...,  X  es  Ik)  BLOCK      END    V ; 
11         n   n ' 


(Vx^es^) 

(Vx^es„  ,...,  X  es  Ik)  BLOCK      END    V ; 
2   2         n   n ' 

END    V; 


(Vx  G  s)  BLOCKl       END    V; 
(Vx  G  s)  BL0CK2      END    V; 

(Vx  e  s)  BLOCKl    BL0CK2      END    V ; 

provided    BLOCKl {x)    commutes  with  BL0CK2  {y) 

for  every  x,yG  s  &  x^^y; 
Obvious  analogues   of  the  three  set  former  simplification 
rules  discussed  previously  in  iii  can  be  made  for  'forall' 
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loops  and  can  result  in  diminishing  the  size  of  or  removing 
iterators . 

DISTRIBUTIVE  TRANSFORMATIONS 

(Vx.Gs,  ,...,  X  es  Ik)   if  P  then  BLOCK      ENDIF;      END    V; 
11         n   n 


(Vx.Gs,  ,  .  .  .  ,  X  Gs  Ik  &  P)  BLOCK      END    V; 
11         n   n ' 


(Vx.Gs,  ,...,  X  Gs  IP  &  K)  BLOCK      END    V; 
11         n   n 


if  P  then  (Vx^Gs^  ,...,  x  Gs  Ik)  BLOCK      END    V ; 
11         n   n  ' 

ENDIF ; 

where   x^  ,...,x    do    not  occur  free  in  P. 
1       n 


V.    Commonly  Occurring   Transformations  Preparatory 
to  Formal  Differentiation 


PI.   #{xGs|K}  ^  [+:  xGs|k]l 
P2.   3xGs|k   =*  ([+:  xGslk]l) 


=    0 


P3.       VxGslk   ^    ([+:    xGspkjl)    =    0 


where  x  is  not  used  beyond 
the  quantifier 


P4.   3x,Gs,  ,...,  x  Gs  Ik  *>  [or:    x,Gs,,...,x  Gs  ]k 
11         nn'  11       nn 

P5.   \/x,Gs,  ,...,  X  Gs  Ik  «"  [&:   x-Gs,  ,  .  .  .  ,x  Gs  ]k 

"11        nn'  11      nn-" 


where 


X, ,...,x   are  not  used  beyond  quantifier 
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P6. 

P7. 

P8. 

P9. 

PIO 

Pll 

P12 

P13 

P14 

P15, 

P16 

P17, 

P18, 

P19, 


y  is  a  new  variable 
name  which  is 
generated. 


P20, 


3xes|k  =*  3x  e  {yes|<k,x  \y>} 

VxGs|k  =»  Vx  e  {yGs|<k,x  \y>} 

s  ^  nullset    =>  ([+:  y  e  s]l)  ^  0 

s  INCS    R  =>  ([+:  y  G  s|y  ^  R]l)  =  0 

S  =  R  =*  (S  li^CS    R)  &  (R  INCS    S) 

{x  G  s|k}  *»  S  -  {x  e  s  I  ~lK} 

{x  G  s|K^  &  K  }  *>  {x  G  S|K^}  -  {x  G  s  I  Hk^} 

[op:  xG  s|K^]e  =•  [op:  x  G  {y  G  s  |  <K  ,x\y>}]e 

[or:    X   G  S]K  =*  ([+:  x  G  s|k]1)  ~\=    0 

[&:  X  G  S]K  =>  ([  +  :  x  G  s  |  Hk]!)  =  0 

S  1=   nullset   =>  ([  +  :  y  G  s]l)  1=    0 

S  *  T  =>  {x  G  s|x  G  T} 

S  -  T  =»  {x  G  s|x  ^  T} 

S  :=  S  +  A  ^  (Vx  G  (A  -  S)  ) 

S  :=  S  +  {x}; 

end   V ; 
S:=S-A=>(VxG(A*S)) 

s  :  =  s  -  { x }  ; 

end    V; 


vi .   Productions  Derivable  from  Previous  More  Basic  Rules 
and  Useful  for  Cleanup  After  Formal  Differentiation 

CI.   {z  G  {y}|K}  =>   if  <K,z  \y>  then  {y}  else  nullset 

C2.   (Vz  G  {y}|K)  BLOCK      END    V;  =>  if  <K,z  \y>  then  <BLOCK,    z  \  y> 

ENDIF', 

C3.   3z  e  {y}  |k  =>  <K,z  \y> 

C4.   Vz  G  {y}  |k  =>  <K,  z  \y> 
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C5 


[op:  z  G  {y}|K]e  =»  if  <K,z  \y>  then  <e,z\\y> 

else  nilpot  (op) 


where ,  e.g., 

nilpot  (  +  )  = 


0  for  addition 

nullset  for  union 

nulltuple  for  tuple  concatenation 

etc . 


C6 
C7 
C8 
C9 


[op:  X  G  {y  €  SlKj^llK^  ^  [op:  x  e  sl<K  ,y  \x>]K 

[+:  z  G  {y}]l  ^  1 

{z  G  nuZ-ZsetlK   =>  nullset 


(Vz  G  nullset    \    K)  BLOCK      END    V;  ^  G 

where  e  is  the  empty  string 
CIO.  3z  G  nullset     \    K   =*  FaZse 


Cll.  Vz  G  nullset 


K 


True 


Other  Cleanup  Transformations 

C12.    3x  G    (i/   c    t/?en    el    else    e2)  |k 
=*      if   c    then    3x  G    el|K 
else    3x   G    e2| K 
C13.    Vx   G     (i/   c    t/zen    el    else    e2)|K 
=*■   t/  c    t/2en    Vx   G    el|K 
else    Vx  G   e2 I K 
C14 .    if    {if   c    then    cl    eZse    c2)     t^zen 

BLOCKl    else 
BL0CK2 
endif 

if    (c    &    cl)    or   He    &    c2    t/zen 

BLOCKl    else 

BL0CK2 

endif 
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vii.  Organizing  Rules  into  an  Automatic  Production  System 
and  an  Efficient  Implementation 

This  section  describes  a  production  system  which  can 
eliminate  unnecessary  set  former,  intersection,  set  differ- 
ence, and  union  operations  found  in  a  program  region  R. 
The  system  consists  of  the  following  5  rewrite  rules  which 
can  be  applied  in  any  order  exhaustively  throughout  R: 

{x   G     {ye    t|K    }|K    }    =»    {w   e    t|<K    ,y   \w>    &    <'^2'^  \v>} 

where   w    is   a      unique      generated   variable 

S*T=>{xes    I    xGt} 

S-T=>{xes    I    x^T} 

x  G  (S  +  T)  =*  X  G  S  or  x  e  t   /*  membership  test  */ 

X  G  {y  G  t|k}  =>  X  G  T  &  <K,y  \  x>   "  " 

The  above  rules  can  be  guided  by  an  efficient  'chain- 
ing' mechanism  discussed  in  general  terms  by  Loveman  [Ll], 
and  detailed  by  Kibler  et  al .  [KIl].   In  what  follows 
we  will  use  Kibler 's  approach  to  design  an  optimized 
production  system  for  those  rules.   In  addition  to  standard 
production  system  features   such  as  use  of  a  parse  tree 
representation  T  of  a  program  and  a  set  of  rewrite  rules  P, 
Kibler 's  system  has  facilities  for  judiciously  selecting 
productions   according  to  their  likelihood  of  succeeding 
and  for   limiting  the  necessary  range  of   search  through  T 

for  a  place  where  such  productions  can  be  applied.   For 
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each  node  N  of  T,  Kibler  associates   a  locality  defined 
by  the  subtree  of  N  and  an  instruction  stack  S(N)  con- 
taining a  possibly  empty  sequence  of  directives;  each 
directive  has  the  form  <location><transformation  list> 
where  <location>  is  either  HERE  or  UP  and 
<transformation  list>  is  a  list  of  production  names. 
Also,  for  each  production  p  of  P,  Kibler  associates   a 
sequence  of  directives  of  the  same  form  as  was  just 
mentioned  for  stacks. 

The  production  system  begins   in  a  starting  state 
[Tq,Sq,Nq]  where  T^  is  an  initial  tree,  S   is  a  mapping 
from  nodes  of  T   into  stacks,  and  N   is  a  node  in  T  . 
The  transition  rule  which  takes  one  state  into  the  next 
is   implemented  by  the  following  SETL  program: 

/*  the  parse  tree  is  represented  by  a  set  T  of  blank  */ 
/*  atoms  along  with  predecessor  and  successor  maps  Tpred  */ 
/*  and  Tsucc  defined  on  T;  N  is  a  node  in  T  and  defines  */ 
/*  the  current  locality;  for  each  node  n  e  T,  S(n)  is  a  */ 
/*  sequence  of  directives;  for  each  production  p  g  P,  */ 
/*  prods (p)  is  a  sequence  of  directives,  Lhs(p)  is  the  */ 
/*  Lhs  pattern  and  Rhs (p)  is  the  Rhs  macro  for  p  */ 
Define      Psys ; 

{while    N  "1=  9.)       /*  halt  when  locality  is  undefined  */ 
{while    S(N)  1=  nulltuple) 

Inst  :=  S(N)(1);    /*  select  instruction  */ 
S(N)  :=  S(N)(2:);   /*  pop  stack  */ 
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If   Inst  =  'HERE'  then    continue-, 
elseif      Inst  =  'Up'  then 

Top  :=  S  (N)  ; 

S(N)  :=  nulltuple;  /*    empty  stack  */ 

N    :=  Tpred(N);        /*  Go  up  */ 

S(N)  :=  Top  +  S{N)  ; 

continue ; 

elseif      Temp  :=  Dmatch (N,Lhs (Inst) ,Pfunc) 

~\-   false      /*  cf..  Appendix  E(ii)  for 

Dmatch  */ 
then   N  :=  Expand (Rhs (Inst) ,Pfunc) ; 

/*   Define  new  locality;  cf.,  Appendix  E(ii)  for 

Expand  */ 
S(N)  :=  prods (inst)  +  S(temp); 

Replace(Temp,N) ; 

end  if; 

end    while  ; 

N  :=  Tpred(N);     /*  enlarge  locality  with  stack  is  empty*/ 

end  while 
end   Psys ; 

In  order  to  illustrate  the  previous  mechanism  with  the 
five  transformations  presented  earlier,  we  list  these  rules 
again  but  with  required  additional  information. 
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NAME:  PI      /  structure  set  intersection  / 

ENABLING  CONDITION:  TYPE(S,T)  =  set,    x  not  free  in  S ,  T, 

Rule:   S*T=='{xes  |  xGT} 

Directions:   HERE  P3  P5  P4  PI  P2  UP  P3  P5  P2 

NAME:  P2      /  structure  set  difference  / 
EC:   TYPE(S,T)  =  set,    x  not  free  in  S,  T. 
Rule:   S-T^{xGS  |  x^T} 
Directions:   HERE  P3  P5  P4  P2  PI  UP  P3  P5  PI 

NAT-IE:  P3      /  set  former  combinator  / 

EC:   w  not  free  in  S,  K,  J 

Rule:   {x  G  {y  G  s|K}|J}  =>  {w  G  S|<K,y  \w>  &  <J,  x  \w>} 

Directions:   HERE  P3  UP  P3  P5 

NAxME:   P4     /  union  removal  / 
EC:  TYPE(S,T)  =  set,      TYPE(result)  =  Boolean 
Rule:   xG  (S  +  T)=>xGSor'xGT 
Directions:   HERE  P4 

NAME:  P5      /set  former  removal  / 

EC:   TYPE (Result)  =  Boolean 

Rule:       xe{yGS|K}=>xeS&    <K,y   \  x> 

Directions:      HERE    P5 
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As  an  example,  we  consider  a  SUBSETL  syntax  tree  for 
the  expression  (S,  *  (S-  +  S^))  -  S.   and  manually  follow 
the  linked  transformations  and  changes  in  locality  stacks, 
More  specifically,  consider  the  initial  tree 


0 
/    \ 


PI  P2  P3 


(i: 


^2    ^3 


where  the  circled  node  represents  the  current  locality, and 
the  initial  stack  attached  to  this  node   contains  PI  at  the 
top  and  P^  at  the  bottom.   Processing  of  this  tree  takes 
place  as  follows: 

PI  succeeds 


I  P3  P5  P4  PI  P2  UP  P3  P5  P2 


/  \ 


^2      ^3 


continued 
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P4  succeeds 


P2  P3 


setformer    S^       |  PI  P2  UP  P3  P5  P2 


,/\ 


or)  P4 


\ 


^1      f     ^ 


|\ 

X   S^   X   S. 


P2  succeeds 


P3  P5  P4  P2  PI  UP  P3  P5  PI  P2  P3 


y  setformer 


or 


/\      /\ 
/\    l\ 

X       S^       X       S. 


/\ 

y       s^ 


a     P3      succeeds 
setformer 


W  S2     W         St 
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The  locality  stacks  are  now  empty  and  the  transformation, 
(S   *(S2+  S3))  -  S^  =*  {wG  S,|(wG  S2  or    we  S^)  Siinot    w€S^)} 
has  been  accomplished,  only   one  user  directive  being  required, 
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viii.  A  Production  System  for  Simple  Automatic  Jamming 

A  recent  paper  by  William  Burge  [BU]  describes  a 
method   of  loop  jamming  for  optimizing  nested  recursively 
defined  Lisp  functions.   All  of  Surge's  techniques  apply 
to  corresponding  iterative  SETL  forms,  and  serve  to  remove 
nesting  from  among  set  formers,  tuple  formers,  and  compound 
operations.   An  exam.ple,  also  considered  by  Burge,  is  to 
find  the  sum  of  the  squares  of  the  odd  numbers  of  a  tuplet, 
i.e.,  SUM(SQUARE (FILTER ( t) ) )  , 

where         FILTER(t)   =   [x  G  t:  odd(x)]  , 
SQUARE (t)   =   [x  **  2:  X  G  t]  , 
and  SUM(t)      ^       [+:  x  g  t]x 

Several   applications   of   'jamming' 
transform   the   high   level   expression 

[+:  X  G  [y  **  2:  y  G  [z  G  t|odd(z)]]]   resulting  from 
procedure  integration  into  the  more  efficient  calculation, 
[+:  w  G  t|odd(w)]w  **  2. 

In  general,  this  kind  of  transformation  can  be 
achieved   automatically  in  a  program  region  R  by  exhaust- 
ively performing  the  (A)  productions  listed  below  and  then 
exhaustively  performing  the  (B)  productions. 
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a.    Preparatory  and  Jamming  Transformations 

[w  G  t]  =*  [w:  w  e  T] 

{we  s }  =*  {w :  w  G  S } 

[we  t  |k]  =*  [w:  we  t  |K] 

{w  e  s  |k}  =*  {w:  w  G  s  |k} 

[e:  we  t]  =>  [e:  we  t\TRUE] 

ie:    w  G  s}  =►  {e:  w  G  s\TRUE} 

[op:    w   G    s]e   =>     [op:    w   G    s|TRUE]e 

[e^^:    w   e     [e^:     z   e    t|K2]  |k    ] 

[<ej^,w  \<e2,z   \y>>:    yet|<K    ,z\y>    &    <K,w   \<e2,z   \y>>] 


where  y  isn't  free  ine,,e  ,K  ,K 


{e,:  w  6  {e-:  z  e  slk^llk,} 


{<e,,w\<e2/Z   \y>>:    yes|<K2,z   \ y>    &    <k, ,w  \<e2/Z   \y>>} 


where    y    isn't   free   in   e,,e2,K    ,K 


[op:    w  e     [62:     z   e    t|K2]  |K^]ej^ 


[op:    y   e    t|<K„,z   \  y>    &    <K    ,w  \<e„,z   \y>>]<e,,w  \<e^,z\    y>> 


where  y   isn't   free   in  e,  ,e^,K^,K^ 


[op:    w   G    {z:    z   G    s|K    }|K    le,    =>    [op :  yGs  |  <K2 ,  z\  y- 

&    <K,  ,w  \y>]<e,,w  \y> 
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B.    Cleanup  Transformations 

TRUE  &  K  =>  K 

K  &  TRUE  =>  K 

[e:  X  G  t|TRUE]  =>  [e:  x  G  t] 

{e:  X  G  s  I  TRUE}  =>  {e:  x  G  s} 

[op:  X  G  t|TRUE]e  =>  [op:  x  G  t]e 

[x:  x  G  t|K]  =*  [x  G  t|K] 

{x:  X  G  s  I K}  =>  {x  G  s  I  K} 

{x  G  s  |TRUE}  =>    {x  G  s} 

[x  G  t  I  TRUE}  =>  [x  G  t] 

{x  G  s  }  =*  S 
[X  G  t]  ^  t 
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ix .   Named  Productions  for  Use  in  Topological  Sort 
Example  of  Chapter  3. 

NAME :   3  FORMAT 

Rule:      3x  e    s|k   =*3x  g    {u  g    s|<K,x   \u>} 

NAME:   SETEQNL 

Rule:   s  =  nullset   =*    ([  +  :  u  G  s]l)  eq    0 

where  u  is  a  unique  name  in  TOPSORT's  namespace,  x,  s,  K 
are  patterns,  and  <,\  ,>   are  metasyrabols  used  to  denote 
substitution . 

NAME:   (To* 

Rule:  {z  G  s|z  =   y}  =>  {y}  *  s 

where  z  isn't  free  in  y 

NAME:         *SIMP 
Rule:         A  *  B  =*•  A 
ENABLING  PRED:  B  INCS    A 

NAME:  {=  NL 

Rule:         {x  G  s|k}  =*  nullset 

ENABLING  PRED:   Vx  e  si  not  K 


NAME: 

IDEM 

Rule: 

A  +  nul  Iset    =>•  A 

NAME: 

USELESS= 

Rule: 

A  =  A;  =>  £ 

NAME: 

EMPTYELSE 

Rule: 

tf  P  THEN    B  ELS 

where  e  is  the  empty  string 


if   V    TEEN   B  END  IF', 
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NAME:         {TOIF 

Rule:        {z  G  {y}|K(z)}  =*  if   K(y)  then    {y}  else    nullset 

where  z  is  not  within  the  scope  of  a  bound  variable 

y  in  K , 

NAME:  DISTIFl 

Rule:         A  =  B  +  if  P  then  x  else  y; 

=>  if  P  then  A  =  B  +  x;  else  A 
where  A  cannot  occur   free  in  P 


B  +  y;  endif; 


NAME :  { to  V 

Rule:        A  =  {x  e  s|K}  =>  (Vx  e  s|K)  A  =  A  +  {x} 
where  A  is  not  free  in  S  or  K 


NAME: 
Rule: 


VSIMP* 

(Vx  e  s|x  e  T)  BLOCK  =>  (Vx  G  S  *  T)  BLOCK 


NAME :  *COMMUTE 

Rule:  A  *  B  =>  B  *  A 

where  B,  A  are  disjoint 


NAME: 
Rule: 

NAME: 
Rule: 


VCONC 


(Vx^  G  s^) (Vx2  G  s2iK)  BLOCK 
=»  (  x^  G  s^,  X2  G  S2IK)  BLOCK 


VCOMMUTE 


(Vx-j^  G  s^,  X2  G  S2  |K)BL0CK 

=>  (VX2  G  S2,  x^  G  s^|k)  block 


where  s-,,x,  are  disjoint  with  X2,S2  &  block  is  order  independent, 
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NAr4E 
Rule 


JATI 


(Vx  e  s)  BLOCK^  end  V;   (Vy  e  s)  BLOCK2  end  V 
=^  (Vw  e  s)<BLOCK,  ,x  \w>;<BLOCK   y  \w>  end  V 
ENABLE  PRED:   where  BLOCK   and  BLOCK^  are  disjoint  and 
w  is  a  unique  generated  name. 


NAME: 
Rule: 

NAME: 
Rule: 

NAME: 
Rule: 


VBRKUP 


(Vx^  e  s^,  X2  G  s  |k)  block 
=*  (Vx,  e  s,  )  (Vx  e    s^lK)  BLOCK 


DEADELIM  FORMAT (SCOPE  =  L) 
'Remove  dead  code  from  Region  L' 

EXECUTE (DEADELIM) 
'Rule  executes  a  procedure' 

VSUBST    FORMAT (SCOPE  =  S) 
EXECUTE (VSUSBST) 


VSUBST   substitutes  the  expression  e  to  the  right  of 
the  first  matched  assignment  statement  A  for  occurrences 
of  the  variable  name  V   (to  the  left  of  that  assignment) 
in  statement  number  s  where  s  is  a  user  supplied  parameter 
The  value  of  the  ud   map  applied  to  occurrences  of  V  in  s 
must  equal  the  program  point  p  containing  A.  Furthermore 
all  paths  from  p   to  s  must  be  clear  of  definitions  to 
variables  of  e. 
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Appendix  E.   Assorted  Utility  Routines  for  a  Source  to  Source 
Transformational  Implementation 

i .    The  Unparser 

The  Unparser  procedure  shown  below  is  used  to  print  the 
source  code  of  a  SUBSETL  program,  given  the  program  in 
parse  tree  form.   We  assume  that  the  parse  tree  is  based  on 
the  grammar  given  in  Appendix  A  and  is  generated  by  any 
suitable  parser  (for  which  we  omit  any  further  description) . 
The  tree  will  consist  of  a  set  N  of  nodes  (each  node 
implemented  as  a  blank  atom)  and  a  map  Tsucc  associating  each 
node  n  with  an  ordered  tuple  Tsucc (n)  of  successor  nodes. 
Aside  from  Tsucc,  we  also  make  use  of  a  number  of  other  maps 
defined  on  N.     These  include  the  following. 

1.  Leaf(n)  is  true   when  the  node  n   is  a  leaf  and 
false      otherwise. 

2.  Label (n)  will  be  a  token  value  if  n  is  a  leaf,  or 

the  lexical  type  (e.g.  'block',  'statement')  associated 
with  an  internal  node  n. 

3.  Number (n)  represents  the  statement  number  of  node  n 
when  Label (n)  =  'statement'. 

The  following  is  a  SETL  version  of  Unparser: 
/*  Program  is  the  root  node  of  the  parse   tree  */ 
Define    Unparser (Program) ; 

/*  Initialize  global  variables  */ 

line  :=  nullchar;    /*  line  is  line  buffer  */ 
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Columns:=  60;   /*  max  no.  of  chars   for  line  */ 
Margin  :-    10;   /*  source  line  begins  after  column  9  */ 
Indent  :=  0 

Printpart (Program) ;   /*  begin  unparsing  */ 
Flush;  /*  print  last  line  */ 

endj 

/*   unparse  the  block  represented  by  parameter  Block  */ 
Define    Printblock (Block) ; 

Indent  :=  Indent+1;   /*  indent  statements  of  block  */ 

(VState  e  Tsucc (Block) ) 

Printstate (State) ;    /*  print  block  statements  */ 

end    V ; 

Indent  :=  Indent  -  1;    /*  restore  previous  indentation*/ 
end; 

/*  unparse  the  statement  state  */ 

Define    Prinstate (State) ; 

Flush;  /*  print  current  buffer  line  */ 

Putleaf (number ( State) ) ;  /*  add  statement  no.  to  line*/ 

Tab(Margin  +  Indent*2);  /*  begin   statement  in 

correct  col  */ 
(Vn  e  Tsucc (State) ) 

Printpart (n) ;   /*  unparse  the  parts  of  statement  */ 

end    V ; 

end; 
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/*  Print  the  subpart  node  of  a  statement  */ 
Define    printpart (node) ; 

if   Label (node)  =  'BLOCK'  then 
printblock (node) ;      else 
if   Leaf (node)  then 

Putleaf (Label (node) ) .   /*  print  token  */ 

Tab (#Line+2) ;   /*  add  2  spaces  after  each  token  */ 

else 
C^x   G  Tsucc(node))     /*  print  subparts  of  node  */ 

Printpart (x) ; 
end    V ; 
endif', 
end; 

/*  Add  the  token  string  to  the  source  line  */ 
Define    Putleaf (string) ; 

/*  if  string  cannot  fit  on  line  then  print  line  and  */ 

/*  add  string  to  next  line  which  is  indented  */ 

if   #line  +  ttstring  >  Columns  then 
Flush; 
Tab (Margin  +  Indent  *  2  +  2); 

endif; 

Put (string) ; 
end; 

/*   Add  spaces  to  line  up  to  the  column  Col  */ 
Define    Tab (Col) ; 

line  :=  line  +  [  +  :  #line  <  i  <_  Col]  '  '  ; 
end; 
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/*  print  line  and  initialize  next  line  */ 
Define       flush; 

if   #line  >  0  then 
Print (line) ; 

endif; 

line  -.-nullohar  ; 
end; 

ii.   Pattern  Matcher 

Dmatch  is  a  function  subprogram  which  attempts  to 
match  a  pattern,  Pat,  to  a  subtree  within  a  SUBSETL  parse 
tree  whose  root  node  we  call  Tree,  using  a  depth  first 
search  through  Tree.  The  parse  tree  structure  is  as 
described  in  connection  with  the  Unparser  algorithm. 
Patterns  will  have  the  same  structure  as  trees.   Maps  Pleaf, 
Plabel,  and  Psucc  will  be  defined  on  the  nodes  of  a  pattern 
tree  and  serve  much  the  same  purpose  as  Leaf,  label,  and 
Tsucc  respectively.  Moreover,  if  n  is  a  leaf  of  a  pattern 
tree,  then  application  of  the  boolean  valued  function 
Literal (n)  will  yield  true   when  Plabel (n)  is  a  literal  and 
false   when  it  is  a  pattern  variable.   Dmatch  will  return 
the  root  of  the  subtree  within  Tree  at  which  a  first  success- 
ful match  occurs.   When  matching  succeeds,  Dmatch  will  also 
return  the  map  Pfunc  as  a  parameter.   This  map  associates 
each  pattern  variable  x  of  Pat  with  the  root  Pfunc (x)  of  a 
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subtree  of  Tree  matched  by  x.   Note  that  Pfunc  is  useful 
for  macro  expansion  subsequent  to  matching  (cf.  the  Expand 
algorithm  in  this  appendix,  section  iii) .   If  Dmatch  is 
unable  to  find  a  successful  match,  then  it  will  return 
false . 

The  actions  of  the  pattern  matching  routine  Match 
which  is  invoked  by  Dmatch  has  been  described  in  detail 
in  Chapter  3  (B)  . 

The  following  SETL  code  implements  Dmatch,  Match  and 
various  related  auxiliary  routines. 

/*  Tree  and  Pat  are  roots  of  a  parse  tree  */ 

/*  and  pattern  tree  respectively  */ 

Definef   Dmatch (Tree , Pat , Pfunc) ; 

if     Match(Tree,  Pat,  Pfunc)  then 

Return    Tree;  else    /*    return  node  where  matching 

succeeds  */ 
-i/  3x  s  Tsucc(Tree)  |  Subtree  :  ==  Dmatch  (x, Pat , Pfunc)  then 

Return    Subtree;  else 

Return    false; 

endif; 

end; 

/*  Match  is  a  routine  whose  sole  purpose  is  to  initialize 
the  pattern  variable  map  Pfunc.  This  permits  the  main 
match  procedure  Matchl   to  call  itself  recursively 
and  preserve  the  previous  value  of  Pfunc  */ 

Definef   Match (Tree , Pattern, Pfunc) ; 

Pfunc  : =  nullset  ; 

Return     Matchl  (Tree,  Pattern,  Pfunc) ; 

end; 
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/*  Matchl   attempts  to  match  the  parse  tree  whose 
root  node  is  Tree  by  the  pattern  tree  whose  root 
node  is  Pattern.  If  the  match  succeeds,  for  each 
pattern  variable  x  in  the  pattern  tree,  Pfunc(x) 
will  be  the  node  in  the  parse  tree  matching  x.  */ 
Definef   Matchl (Tree, Pattern, Pfunc) ; 
if   Pleaf (Pattern)  then 

if   Literal (Pattern)  then 

Return    Plabel (Pattern)  =  Label (Tree) ;  else 
if   Plabel (Pattern)  ^  Dom   Pfunc  then 
Pfunc (Plabel (Pattern) )  :=  Tree; 
Return    true;    else 

Return  Equals (Pfunc (Pattern) ,  Tree) ; 
endif;    else 
if   #Psucc (Pattern)  =  #Tsucc(Tree)  then 
Return    V  1  <^  n  <_  #Psucc  (Pattern)  | 

Matchl (Tsucc (Tree) (n) ,Psucc (Pattern) (n) , Pfunc) ; 

else 
Return    false; 

endif; 

end; 

/*  Equals  is  a  predicate  that  decides  whether  2  trees 
Tl  and  T2  have  the  same  values;  i.e.,  the  same 
structure  and  the  same  leaf  values  */ 

Definef   Equals (Tl ,T2) ; 
if   Leaf (Tl)  then 

Return    Label (Tl)  =  Label (T2);  else 
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Return    V  1  <_  i  <_  #Tsucc(Tl)  | 

Equals (Tsucc (Tl)  (i)  ,Tsucc (T2)  (i)  )  ; 
endif; 
end ; 
iii.  Macro  Expander 

The  function  subprogram  Expand  performs  macro  expansion 
by  generating  a  tree  from  a  pattern  Pat  and  a  pattern  variable 
map  Pfunc.   It  replaces  each  pattern  variable  x  within  Pat 
by  the  tree  Pfunc (x).   The  tree  generated  will  then  have 
the  same  structure  as  Pat  down  to  the  leaves  of  Pat. 

A  utility  routine  copytree  is  used  to  make  a  fresh  copy 
of  a  tree . 

In  SETL  these  substitution  routines  are  as  follows: 

Definef   Expand (Pat , Pfunc) ; 

if    Pleaf(Pat)&  HLiteral (Pat)  then 

Root  :=  Copytree (Pfunc (Plabel (Pat) )) ;  else 
if   Leaf (Pat)  then 

Root  :=  newat; 

Label (Root) :=   Plabel (Pat);  else 

Root  :=  newat', 

Tsucc (Root)  :=  [Expand (x):  x  e  Psucc (Pat) ] ; 
endif; 

Return    Root; 
end; 
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Definef   Copytree ( Tree)  ; 

node  :=  newat;         Label(node)  :=  Label(Tree); 

Tsucc(node)  :=  [Copytree (x) :    x  e  Tsucc (Tree) ] ; 

Return    node; 
end; 
iv.   Utility  Routines  for  Formal  Differentiation 

The  two  procedures,  Regconst  and  Indvars,  shown  below 
compute  the  region  constants  and  induction  variables  within 
a  loop  L.   Both  these  routines  can  be  used  (with  minor 
adjustments)   in  all  of  the  FD  frameworks  discussed  in 
Chapters  III  and  IV.  They  make  use  of  the  following  global 
variables : 

1.  L,  Leaf,  Label,  Tsucc   represent  the  parse  tree  for 
the  loop  L. 

2.  Vars  is  the  set  of  variable  names  used  in  L. 

3.  For  each  v  e  Vars,  Def s (v)  is  the  set  of  definition 
points  in  L  for  v. 

4.  F  and  D  are  the  elementary  form  and  derivative  tables. 

5.  Pleaf,  Pvar,  Pleaves  and  Plabel  are  maps  defined  for 
patterns  (cf.  Appendix  E(ii)).  . 

Regconst  defines  the  global  variable  RC ,  the  set  of  all 
nodes  n  e  L  in  which  Text(n)  is  a  region  constant  expression, 
Indvars   computes  the  global  map  IV  which  associates  the  set 
of  induction  variables  IV(x,f)  for  each  pattern  variable  x 
within  each  elementary  form  f  G  F. 
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/*  RC  is  the  set  of    nodes  n  e  L  such  that  Text(n)  is  a 

region  constant  expression  */ 
Define    Regconst; 

/*  initialize  RC  to  the  set  of  leaf  nodes  corresponding 

to  constant  and  region  constant  variables  */ 
RC  :=  {n  G  L|Leaf(n)  &  (Label (n)  ^  Vars  or 

Defs (Label (n) )  =  nullset)}; 
/*   Find  the  region  constant  expressions  */ 
(while  3  n  e  (L-RC) ] (Vy  e  Tsucc (n) |y  e  RC) ) 

RC  :  =  RC  +  {  n }  ; 
end    while; 
end; 

/*  Compute  the  set  IV(x,f)  of  induction  variables  for  every 

pattern  variable  x  in  each  elementary  form  f  6  F  */ 
Define    Indvars; 

/*  compute  the  set  of  region  constant  variables  */ 

Rvars  :=  {Label(n):  nGRC|Leaf(n)  &  Label(n)  G  Vars}; 

/*  Initialize  IV  */ 

IV  : =  nullset ; 

(Vf  G  F,  X  G  Pleaves(f)  iPvar(x)) 
IV(Plabel(x) ,f )  :=  Vars  -  Rvars; 

end    V; 
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/*  Find  each  IV  set  by  a  negative  transitive  closure 

algorithm  */ 
(while   3  [x,f]  e  Project (IV, 2) ,  v  e  iv(x,f) ,  n  e  Defs(v) 

I  n(3t  G  D(x,f)  |Match(n,t(l)  ,Pfunc)  &  t(4))) 
/*  t(4)  is  a  SETL  code  block  whose  value  must  be  true 
for  matching  to  succeed  */ 
IV(x,f)  :=  IV(x,f)  -  {v}; 
end   while ; 
end', 

/*  Compute  the  set  of  leaf  nodes  of  pattern  Pat  */ 
Definef   Pleaves (Pat) ; 
if   Pleaf (Pat)  then 

Return    {Pat};  else 

Return    [+:  x  e  Psucc(Pat)]  Pleaves (x) ; 
endif; 
end', 


The  following  routine,  Postorder,  computes  a  tuple 
containing  the  nodes  of  a  parse  tree  (of  the  standard  form 
that  we  have  been  using  all  along)  arranged  in   postorder, 
i.e.,  left  to  right   successors  before  root  order. 
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/*  Traverse  successors  from  left  to  right.*/ 

Then  visit  the  root  */ 
Definef   Postorder (Tree) ; 

T  :=  nulltuple; 

Post (Tree, T) ; 

Return    T; 
end; 

Define      Post (Tree ,T) ; 

if   Leaf (Tree)  then   return; 

endif; 

(Vx  6  Tsucc (Tree) |lLeaf (x) ) 
Post (x,T)  ; 

end    V  ; 

T  :=  T  +  [Tree] ; 
end; 
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V.    Revised  Matching  and  Expansion  Routines 

The  two  routines ,  Match  and  Expand  shown  below  are 
more  powerful  versions  of  the  simpler  routines  of  the 
same  name  described  earlier  in  this  appendix.  Match  accepts 
as  input  parameters  the  root  nodes  Tree  and  Pattern  of  a 
parse  tree  and  pattern  tree  respectively.  Match  initializes 
Pfunc  to  the  empty  set  and  calls  the  main  matching  utility 
Matchl   which  defines  the  pattern  variable  map  Pfunc. 
However,  the  parameters  Tree  and  Pattern  for  Matchl   must 
be  tuples  whose  components  are  the  root  nodes   of  parse 
and  pattern  subtrees .   Likewise ,  the  parameter  Pattern  used 
by  Expand  will  be  a  tuple  of  root  nodes  of  subpattern  trees. 

These  routines  are  sufficiently  powerful  to  handle 
patterns  specified  in  the  FD  tables  of  Appendix   C  (iii)  . 
Thus,  they  are  fundamental  components  of  the  FD  implementa- 
tion design  of  Chapter  4 . 

/*  Match  initializes  Pfunc  to  nullset    and  passes  the  root 

nodes  Tree  and  Pattern  to  Match  1  */ 
Define f   Match (Tree, Pattern,  Pfunc); 

Pfunc  :=  nullset ; 

Return      Matchl  ( [Tree] ,  [Pattern] ,  Pfunc) ; 
end; 

/*   Match  1  attempts  to  match  an  ordered  forest  'Pattern'  */ 
/*  to  an  ordered  parse  tree  forest  'Tree'.  During  successful  */ 
/*  matching  Pfunc  will  be  built  up  by  associating   pattern    */ 
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/*  variables  x  with  parse  tree  nodes  Pfunc(x)  matched  by  x .  */ 
/*  When  matching  fails,  Pfunc  will  be  restored  to  its 

previous  state  */ 
De fine f  Matchl (Tree,    Pattern,  Pfunc); 
Local    Savfunc; 

if   Pattern  =  nulltuple  then   /*  consider  trivial  match-  */ 
if   Tree  =  nulltuple  then   /*    int  decisions  first      */ 
Return    True;    else 
Return    False; 
endif; 
endif; 

P  :=  Pattern(l);   /*  fetch  the  first  pattern  */ 
if   Tree  =  nulltuple    &  nControl(P)  then 

return   false; 
endif ; 

if   Control (P)  then  /*  if  pattern  is  a  procedure  */ 

if   Plfabel(P) (P)  then    /*  execute  Plabel(P)(P)   */ 
Return    Matchl (Tree ,Pattern ( 2 : ) ,Pfunc)  else 
Return   false ; 
endif; 
endif; 

T  :=  Tree(l)  ;      /*  fetch  first  parse  tree  from  forest  */ 
if   Pvar(P)  then         /*    if  p  is  a  pattern  variable  */ 

if   npvarut(P,T)  then      /*  execute  a  utility  routine  */ 

return    false;       /*   which  handles  this  case;  if  */ 
endif  /*  False    is  returned  matching  fails;  */ 

endif;  /*  else,  Pvarut  will  associate  Pfunc (Plabel (P) ) 

with  T  */ 
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if   Alt(P)  then  /*  if  P  represents  alternation  save  */ 

Savfunc  :=  Pfunc;  /*  Pfunc  and  match  one  of  the  alternands .*/ 
(Vx  e  Psucc(P) ) 

if   Matchl(Tree,  x  +  Pattern (2:)  then 

return    true;    else    /*  return  true  if  success  */ 
Pfunc  :=  Savfunc;  /*  restore  Pfunc,  and  try  again;  */ 
endif;  /*  otherwise  return  false         */ 

end   V;  /*  if  no  alternand  succeeds  */ 

return    false ; 
elseif   Literal (P)  then 

return    Plabel(P)  =  Label (T)  & 

Matchl (Tree(2: ) ,  Pattern (2:),  Pfunc); 
elseif   Leaf(P)  then 

return   Matchl (Tree (2 :) ,  Pattern(2:),  Pfunc);  else 
return   Matchl (Tsucc (T) ,  Psucc (P) , Pfunc)  & 
MatchKTree  (2:  )  ,  Pattern  (2:),  Pfunc); 
endif; 
end; 
/*  Pvarut   handles   pattern   variables.  The  parameters 

P  and  T  are  the  roots  of  a  pattern  and  parse  tree  resp.  */ 
Definef   Pvarut(P,T); 

Pname  :=  Plabel(P);  /*  The  value  of  Pname  is  the  name  of  */ 

/*  the     pattern  variable  at  P  */ 
if   Pgen(P)  then  /*    if  a  generated  name  for  the  */ 

/*  assignment  variable  is  required  */ 


612 


Sname  :=  '#'  +  Pname;  /*  fetch  special  name  to  */ 
Pfunc(Sname)  :=  Pfunc (Sname) +1  ort   1;  /*  increment  counter" 
Pname  :=  Pname+Code (Pfunc (Sname) ) ;  /*  generate   name  */ 
endif; 

if   Pname  ^  DOM   Pfunc  then         /*    if  pattern     variable  has  not  */ 
Pfunc (Pname)  :=  T;   /*  been   previously  encountered  record 
return    true;    else;       /*  it;  else,  check  for  consistency.  */ 
return   Equals (Pfunc (Pname,T)  )  ;  /*  equals  is  the  same  as  in  */ 
endif;  /*  connection   with  the  earlier  Match  */ 

end;  /*  routine  in  setcion  ii .  */ 

/*  Pattern  is  an  ordered  forest  of  patterns,  */ 
/*  Expand  returns  an  ordered  forest  of  parse  trees.  */ 
D'efinef   Expand  (Pattern  ,  Pfunc); 
Local      Savfunc; 
if   Pattern  =  nulltuple      then 

return   nulltuple;         /*  trivial  expansion  */ 
endif ; 

P  :=  Pattern(l);       /*  Fetch  first  pattern.  */ 
if   Control (P)  then  /*  If  pattern  is  a  procedure,  execute  */ 

if   Plabel(P)  (P)  t^eri/*  Plabel(P)(P)  */ 

return      Expand (Pattern (2 :) ,  Pfunc)  else 
return    false; 
endif; 
elseif   Alt(P)  then  /*  If  P  represents  alternation  */ 

Savfunc  :=  Pfunc;   /*   save  Pfunc  and  try  to  expand  */ 
(Vx  e  Psucc(P))     /*  an  alternand  successfully  */ 
if   Temp  :=  Expand  (x+Pattern  (2 :)  )  j^   False    then 
return    Temp;  else 
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Pfunc  :=  Savfunc;  /*  restore  Pfunc  after 

each  failure  */ 

endif; 

end   V; 

return    false; 

elseif   Literal(P)  then 

Root  :=  newat; 

Label (Root)  :=  Rlabel(P); 

elseif   iLeaf (P)  then 

Root  :=  newat; 

if   Temp  :=  Expand (Psucc (P) ,  Pfunc)  ^   false    then 

Tsucc(Root)  :=  Temp;  else 

return    false ; 

endif; 

endif; 

if   Pvar(P)  then        /*  If  P  must  be  assigned  fetch  the  name  */ 

Pname  :=  Plabel(P);   /*  of  the  pattern     variable.  */ 

if   Pgen(P)  then      /*  If  the  name  is  generated,  create  new 

name  */ 
Sname  ;=  '$'+Pname; 

Pfunc (Sname)  :-   Pfunc (Sname)  +  1  ort    1 ; 
Pname  :=  Pname  +  Code (Pfunc (Sname) ) ; 
end^/■; 

if   Ngen(P)  then      /*  In  the  case  a  tree  is  stored  in  Pfunc  */ 
if   Leaf(P)  &  HLiteraKP)  then 

Root  :=  newat;  /*  create  tree  */ 

Label (Root)  :=  newname ;   /*  create  new  variable  name  */ 
endif; 
Pfunc (Pname)  :=  Root;        /*  store  tree  */ 
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endif; 

if   Pname  f-    Dom   Pfunc  then      /*  If  Pfunc  cannot  satisfy  */ 

return    false;  */  Pname,  expansion  fails  */ 

endif; 
endif; 

t/  Temp  :=  Expand (Pattern (2 :) ,  Pfunc)  ^      false    then 
return [Root]  +  Temp;  else 
return    false; 
endif; 

It  would  be  convenient  for  our  FD  implementation  to 
include  a  compiler   for  translating   the  patterns  given 
in  Appendix  C  (iii)  into  their  tree  representation  —  the 
form  which  must  be  passed  as  input  to  Match  and  Expand. 
Until  such  a  compiler  is  built,  however,  we  will  have  to 
perform  this  translation  by  hand   coding  in  SETL.   In  the 
following  discussion  we  give  rules   for  translating  pattern 
expressions  into  pattern  trees   (cf.  the  informal  discussion 
of  patterns  in  Chapter  4  (B)  and  the  formal  syntactic 
description  of  our  pattern  language  in  Appendix  C  (iii) ) . 

Patterns  will  be   implemented  using  a  set  N  of  blank 
atoms  (representing  nodes),  a  successor  map  Psucc(n),  a  root 
node  r,  and  an  assortment  of  maps  defined  on  N.  Some  of 
these  maps,  especially  Plabel,  Pleaf  and  Literal  are  used 
in  much  the  same  way   as  before. 
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To  synthesize  a  pattern  tree  from  a  pattern  expression 
we  will  use  the  following  order  of  evaluation. 

1.  Process  pattern  expressions  from  within  innermost  to 
outerpost  brackets  (these  include  parentheses  and  predecessor 
formation  brackets).  Evaluate   subexpressions  in  the  order 
defined  by  the  following  rule. 

2.  A  pattern  expression  e  selected  by  step  1  is  processed 
by  evaluating  all  concatenations  before  evaluating  alterna- 
tions.  Next,  if  e  is  enclosed  within  parentheses,  we 
evaluate  the  factor  (e)  as  the  value  of  e ;  otherwise,  if  e 
is  the  argument  of  a  predecessor  formation  operation 
evaluate  the  term  [e]  according  to  rule  6  below. 

The  actual  pattern  construction  steps  which  must  be 
taken  to  transform  pattern  expressions  into  pattern  trees 
(using  the  preceding  order  of  evaluation)  are  as  follows: 

1.  The  value  of  a  literal  symbol;  e.g.,  ',',  is  a  tuple 
[n  :=  newat]  .   On  encountering   such  a  symbol,  we  also 
execute  the  following  assignments: 

Leaf(n)  :=  Literal (n)  :=  true; 
Plabel(n)  :=','; 

2.  The  value  of  a  procedure  name;  e.g.,   !Cvar  is  a 
tuple  [n  :-   newat].      In  processing  such  a  name,  we  also 
perform   the  following  actions. 

Leaf(n)  :=  Control(n)  :=  true; 

Plabel(n)  :=  Cvar ;  /*  Cvar  is  a  procedure  name  */ 
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3.  The  value  of  a  pattern  variable  x  is  [n  :=  newat]  . 
In  processing  such  a  variable  we  execute  the  code 

Leaf (n)  :=  Pvar (n)  :-    True; 

Plabel(n)  :=  'x'; 
If  the  pattern  variable  is  dotted  (e.g.,  x.)  we  will  also 
execute 

Pgen (n)  :=  True ; 
For  pattern  variables  x*   we  must  perform  the  additional 
assignment, 

Ngen(n)  :=  true; 

4.  The  value  of  a  pattern  name  C  is  the  value  of  the 
pattern  expression  on  the  right  side  of  the  assignment 
which  defines  C.   This  value  will  always  be  a  tuple  of 
one  or  more  components   each  of  which  is  a  blank  atom 
representing  the  root  node  of  a  subpattern  tree. 

5.  If  PI  and  P2  are  two  pattern  expressions  then  the  value 
of  the  concatenation  of  PI  with  P2,  written  PI  P2  as  in 
Snobol,  is  the  SETL  tuple  concatenation  PI'  +  P2 '  where  PI' 
is  the  value  of  PI  and  P2 '  is  the  value  of  P2. 

6.  If  P  is  a  pattern  expression  and  P'  is  the  value  of  P 
then  the  value  of  the  predecessor  formation  [P]  is 

[n  :=  newat],    in  connection  with    which  we  perform 
Psucc (n)  :=  P' . 

7.  If  P1,P2,...,PN  are  pattern  expressions  whose  values 
are  PI '  ,P2  '  ,  .  . . ,PN  '  then  the  value  of  the  alternation 
P1|P2|...|PN   is  [n  :=  newat]   in  connection  with  which 
we  execute  the  code 
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Alt (n)  :-    true ; 

Psucc(n)  :=  [PI ' ,P2 ' , . . . ,PN ' ] ; 

Note  that  for  recursive  pattern  definitions;  e.g., 

(1)  Params  =  q3.  ','  Params  |  q3. 

where  the  pattern  name  Params  appears  on  the  right  and 
left  hand  side  of  the  same  assignment,  we  assign  [newat]  . 
which  is  the  value  of  the  alternation  expression  on  the 
right  hand  side  of  (1)  to  the  pattern   name  Params. 
After  doing  this,  we  can  evaluate  the  two  alternands. 
The  actual  SETL  code  used  to  evaluate  (1)  is  given  below. 

/*  evaluate  q3 .  ' , '  Params  */ 

Tl  :=  [n  :=  newat];         /*  evaluate  q3 .  */ 

Pgen(n)  :=  Pvar (n)  :-    true; 

Plabel(n)  :=  'q3'; 

Tl  :=  Tl+[n  :=  newat];;   /*  evaluate  q3 .  ','  */ 

Pliteral (n) : =  true; 

Plabel(n)  :-  ','; 

Tl  :=  Tl+Params  :-    [save  :=  newat]; 

/*  evaluate  q3 .  ' , '  Params  */ 
/*  evaluate  q3.  on  the  right  */ 

T2  :-    [n  : =  newat] ; 

Pgen(n)  :=  Pvar(n)  :=  true; 

Plabel(n)  :=  'qS'; 

/*  evaluate  alternation  */ 

Alt(save)  :=  true; 

Psucc(save)  :=  [Tl,T2]; 
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APPENDIX  F.   ADDITIONAL  CASE  STUDIES 

In  the  following  Appendix,  we  explore  the  potential 
of  formal  differentiation  for  algorithm  optimization  by 
studying  four  more  programs.   These  programs  are  somewhat 
more  complicated  than  those  studied  in  Chapter  4,  and 
they  require  minor  adjustments  and  extensions  to  the  trans- 
formations found  in  Appendix  C  (iii)  .   These  extensions  to 
the  F,  D,   and  Init  tables  point  to  further  extensions 
which  would  lead  to  a  fairly  complete  implementation  design 
of  FD  for  SETL. 

The  first  example  considered  is  a  derivation  of  an 
efficient  bubble  sort  algorithm.   (Note  that  a  similar 
derivation  was  first  described  in  [Sch  9,  Sch  10].)  In 
its  base  form,  the  bubble  sort  can  be  written  in  SETL  as 
follows . 

(1)   1       (while  31  ^  n  <  #v|v(n)  >  v(n  +  1)) 

2  [v(n),  v(n+l)]  :=  [v(n+l),  v(n)]; 

3  end   whi le ; 

where  the  input  variable  v  is  a  tuple  of  integers  to  be 
sorted  in  place. 

In  order  to  apply  FD  to  (1)  we  must   first  recognize 
the  existential  quantifier  within  (1)  as  an  instance  of 
the  general  form  (18)  of  Chapter  II  (c) .   For  this  to  be 
the  case,  we  must  demand  that  the  predicate  v(n)  >  v(n+l) 
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occurring  in  line  1   of  (1)  should  be  independent  of  #v. 
But  this  is  obvious  since  #v  is  a  region  constant  expres- 
sion; i.e.,  by  simple  range  analysis  of  the  variable  n  we 
may  know  that  the  double  indexed  assignment  to  v  at  line  2 
of  (1)  will  not  spoil  the  value  of  #v.   Thus,  we  can  prepare 

(1)  for  FD  by  transforming  the  expression  3  l£n<#v | v (n) >v (n+1) 
into   n  :=  [min:    1  <^  m  <  #v|v(m)  >  v(m+l)]m  ^    Q. . 

After  this,  we  can  improve  the  bubble  sort  by  reducing 

(2)  n  =  [min:    1  ^  m  <  #v  |  v(m)  >  v(m+l)]m  . 

The  derivative  code  for  (2)  relative  to  the  changes  to  v 
at  line  2  of  (1)  is 

(3)  T:=t/l<_n&n<n  then    [n]  else    nulltuple 

+  if   1  <_   n-1  &  n-1  <  n  then    [n-1]  else    nulltuple 
+  if   1    <_   n+1  &  n+1  <  n  then    [n+1]  else    nulltuple 
+if   1  ^  n  &  n  <  n  then    [n]  else    nulltuple ; 
[v(n),  v(n+l)]   :=   [v(n+l),  v(n)]; 
i/ 3  m  G  T  I  v(m)  >  v(m+l)  then 

n  :=  m;  else 
if  1  (v(n)  >  v(n+l)  )  then 

n  :=  [min:    n+1  <^  m  <  #v|v(m)  >  v(m+l)]m; 
endif; 

which  is  obtained  from  a  general  update  rule  (for  multiple 
indexed  assignments)   which  combines  the  efficient  Rule  2 
Technique  (21)  of  Chapter  II  (c)  with  the  derivative  code  (9) 
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of  Chapter  II  (c) .   Transformation  is  completed  by  insert- 
ing an  assignment  (2)  at  the  entry  to  the  while  loop 
within  (1 ) . 

The  actual  changes  to  the  FD  implementation  of  Chap- 
ter 4   necessary  to  perform  the  transformations   just 
described  involve  rather  straightforward  additions  to  the 
F,  D,  and  Init  tables  and  a  minor  generalization  of  Rule  2. 
We  consider  these  extensions  to  be  attractive  future 
possibilities. 

To  obtain  a  final  form  of  the  bubble  sort,  several 
low  level  cleanup  transformations  must  be  applied.  Speci- 
fically, the  assignment  to  T  within  the  derivative  code  (3) 
can  be  simplified  if  we  note  that  both  the  relations 
n  <  n  and  n  +  1  <  n   can  be  replaced  by  False,    while  the 
relation  n  -  1  <  n  can  be  replaced  by  True.      Further 
applications  of  simplifying  syntactic  transformations  of 
the  sort  found  in  [ST2]  and  Appendix  D  lead  to  a  more 
attractive  assignment,  T  :=    if   2    <_  n.    then    [n-1]  else 
nulltuple .       Still  further  simplification  of  (3)  is  possible 
if  we  note  that  the  relation  v(n)  >  v(n+l)  (which  occurs 
in  the  while    loop  predicate)  holds  just  prior  to  the  multiple 
assignment  [v(n),  v(n+l)]  :-    [v(n+l),  v(n)]   but  its 
negation  holds  immediately  afterwards.   Consequently,  the 
IF  statement  occurring  in  (3)  can  be  optimized  into  the 
following  form. 
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if      m  G  T|v(m)  >  v(m+l)  then 
n  : =  m ;  else 

n  :=  [min  :    n+1  <_  m  <  #v|v(m)  >  v(m+l)]m; 
endif; 

The  changes  to  (1)  made  thus  far  lead  to  the  following 
code . 
(4)   1       n  :=  [min  :    l<_m<  #v|v(in)  >  v(m+l)]m; 

2  [while    n   ^   9.) 

3  T    :=   if    2    <_  n   then     [n-1]  else      nulltuple', 

4  [v(n),  v(n+l)]  :=  [v(n+l),  v(n)]; 

5  if    3m  eT|v(m)  >  v(m+l)  then 

6  n:=m;eZ-se 

7  n  :=  [min:    n+1  <_  m  <  #v|v(m)  >  v(m+l)]m; 

8  endif; 

One  last  chain  of  cleanup  transformations  will  bring  us 
to  our  goal.   First  we  apply  the  directive  VSUBST,  3 
(cf.  Appendix  D  (IX))  which  replaces  the  single  use  of  T 
at  line  5  by  the  conditional  expression  at  line  3,  and 
then  deletes  line  3.   Next  we  make  successive  applica- 
tions of  transformations  C12,  CIO,  C3,  and  C14  of 
Appendix  D  (VI)   followed  by  standard  simplifying  boolean 
identities  of  Appendix  D(ii)   to  the  IF  statement  at 
line  5  of  (4) .   Our  final  form  of  the  bubble  sort  is  then 
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(5)   n  :-    [min:    1  £  m  <  #v|v(m)  >  v(m+l)]m; 
{while    n  7^  fi) 

[v(n) ,  v(n+l)]  :=  [v(n+l),  v(n)]; 
if    2    <^   n  &  v(n-l)  >  v(n)  then 
n  : -  n- 1 ;  else 

n  :=  [min:    n+1  ^  m  <  #v|v(m)  >  v(m+l)]in; 
endif; 
end   while; 

The  manual  effort  required  to  apply  all  of  the  cleanup 
transformations  mentioned  seems  exorbitant.  It  seems  likely, 
however,  that  efficient  production  systems  of  the  kind 
proposed  by  Kibler  and  Standish  [KIl]  and  exemplified 
within  Appendix  D  (VII)  and  (VIII)  might  reduce  the  amount 
of  manual  intervention.   Such  transformation  families  might 
be  enabled  by  assertions  propagated  from  loop  predicates 
throughout  the  program  text,  and  applied  automatically. 
Methods  of  this  kind  tailored  to  SETL  have  been  worked 
out  by  E.  Deak  [D] .   Further  research  along  these  lines 
look  promising  for  the  future. 

For  another  example  of  algorithm  improvement  by  FD, 
we  consider  an  algorithm  which  finds  all  nonterminals  in 
a  context  free  grammar  from  which  the  empty  string  A 
can  be  derived.   The  base  form  SETL  program  we  use  to 
specify   this  algorithm  accepts  the  grammar  G  as  input, 
and  outputs  the  appropriate  set  S  of  nonterminals.   G  is 
represented  as  a  function  which  maps  each  nonterminal  n 
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into  a  set  of  terms  G(n)  immediately  derived  from  n.  Each 
term  t  £  G(n)  is  a  tuple;  each  component  of  t  contains 
either  a  terminal  or  nonterminal  of  G. 

We  begin  our  consideration  of  this  example  with  the 
following  succinct  program, 

(6)  1  S  :-  {n  G  Dom   G     \     A  G  G(n)}; 

2  {while  3  n   G    Dom   G|n   ^    S    &     (  3tGG(n)  |  (VyGt  |yGS)  )  ) 

3  S     :  =   S    +   {  n }  ; 

4  end   while', 

Next  we  prepare  the  while    loop  predicate  of  (6)  for  FD 
by  applying  the  following  transformations  (all  of  which 
are  described  in  Appendix  D  (v) :  P3  and  P13  to  the  universal 
quantifier,  P2  and  P13  to  the  inner  existential  quantifier 
of  the  predicate,  and  finally  P6  to  the  outermost  quantifier. 
The  predicate  which  results  is 

(7)  3n  G  {x  G  Dom   G  |  x  ^  S  &  ([  +  :  t  G  {y  G  g(x)  ] 

([+:    z    G   {w   G    y|w   f    S}]1    =    0)}]1    7^    0)} 

To  reduce  the  outermost  setformer   appearing  in  (7) ,  we 
can  use  a  single  directive: 

(8)  $FD,2,W  =  {x  G  Vom    GJx  ^  S  &  ([+:  t  G  {y  G  G  (x)   | 

([+:     z    G    {w    G    y|w   9-    s}]l    =    0)}]1    ^    0)}. 

However,  to  handle  (8)  the  FD  implementation  design  of 
Chapter  4  must  incorporate  a  few  revisions.  This  is  because 
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our  current  version  of  algorithm  ISETL  will  fail  to 
recognize  that  c^  (y)  =  {w  s  y|w  ^  s}  is  reducible.  In  fact 
ISETL  excludes  from  reduction  any  expression  which,  like 
c,  ,  has  set  or  tuple  valued  discontinuities.   However, 
when  the  range  of  values  for  a  set  or  tuple  valued 
discontinuity   y  can  be  bounded  explicitly;  e.g.,  when  y 
belongs  to  a  set  valued  variable  which  is  invariant  within 
the  optimization  loop,  we  can  relax  these  restrictions. 
In  the   case  of  c,  ,  the  usetodef  map  will  help  determine 
that  the  value  of  the  tuple  valued  discontinuity  y  used 
in  c,  must  belong  to  the  range  of  the  map  G.  This  fact 
allows  us  to  define  c,  on  entrance  to  the  while    loop  L 
of  (6)  by  executing 

(9)  (Vx  G  Bom   G,  y  G  g (x) ) 

c^  (y)  :=  {  z  6  y I z  ^  s}  ; 
end    V  ; 

and  to  keep  c,  available  in  L  by  executing  prederivative 
code 

(10)  (Vy  G  ({n}-s),  te{w  G  Dom    c  |y  G  w}) 

c^  (t)  :=  c^(t)  -  {y}; 
end    V  ; 

just  prior  to  the  definition  s  :=  s  +  {  n}   at  line  3  of  (6] 

Assuming  then  that  algorithm  ISETL  can  mark  c, 
reducible,  it  will  also  be  able  to  recognize  the  other 
reducible  subexpressions  of  W  (cf.  (8)).   These  are 
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c^ (y)  =  [+:  z  e  c^ (y) ]1  , 

c^(x)  =  {ye  G(x)  I  c^  (y)  =  0}  , 

c^(x)  =  [+:  t  s  c  (x)]l  ,   and 

c  =  {x  e  Dom    G|x  ^  s  &  c  (x)7^0}. 

Next  we  envision  a  slightly  more  refined  version  of  the 
reduction  algorithm  2SETL  of  Chapter  4  in  which  successive 
applications  of  reduction  would  interleave  the  application 
of  a  standard  collection  of  cleanup  transformations.  This 
would  enable  us  to  simplify  (10)  directly  into 


(10')  (Vt  e  (w  6  Dom   c  |n  G  w}) 


c^  (t)  :=  c^(t)  -  {n}; 


end    V  ; 


The  differentiation  rule  for  c,  will  require  that  the 

calculation  c, (n)  =  {w  e  Dom   c, |n  g  w}   occurring  within 

(10')  should  be  reduced.   c^  can  be  treated  as  a  special 

6 

case  of  elementary  form  1  with  conjunct  H  described  in 
Appendix  C  (iii) .   Since  c,  is  invariant  within  L  we  only 
need  to  initialize  by  executing 


(11 )  c^     :=    nullset; 

6 

( Vx  e  Dom   c,  ,  y  €  x) 

if  Y   ^    Dom    c^    then 

c,  (y)  :=  c, (y)  +  {  x} ;  else 
b  o 


Cg  (y)  :=  {  x} ; 


endif; 
end    V  ; 


626 


immediately  after  executing  (9).   Consequently,  the  code 
(10')  further  simplifes  to 

(12)  (Vt  e  Ntinrhs (n) ) 

c^  (t)  :=  c^(t)  -  {  n}; 
end    V  ; 

where  we  suppose  that  Ntinrhs  is  the  user  supplied  name 
replacing  c^ .   The  reduction  alaorithm  can  now  proceed 
straightforwardly  from  inner  to  outer  expressions  in  a 
manner  consistent  with  the  method  of  Chapter  4  and  the 
tables  of  Appendix  C  (iii) .   The  prederivative  of  c^  with 
respect  to  the  change  to  c.    occurring  in  (12)  is 

(13)  C2(t)  :=  C2(t)  -  [+:  y  e  {n}]l; 
which  simplifies  immediately  to 


(13')  C2(t)  :=  C2(t)  -  1; 

Since  no  uses  of  c,  occur  in  the  derivative  code  for 
c„  ,  we  can  define  c„  at  the  entry  to  L  and  remove  all 
dead  definitions  to  c^  from  the  program.   This  puts  the 
original  program  (6)  into  the  following  transitional  form: 
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(14)  1  s  :=  (n  e  Dom   G  |  A  e  G(n)}; 
/*  prologue  */ 

2  (Vx  e  Dom  G,    y  G  G(x)  ) 

3  ^^2^^^  :=[  +  :zey|z^s]l; 

4  end    V; 

5  Ntinrhs  :=  nullset; 

6  ( Vx  e  Dom   c„,  y  6  x) 

7  if  Y   ^    Dom   Ntinrhs  then 

8  Ntinrhs (y)  :=  Ntinrhs (y) +( x} ;  else 

9  Ntinrhs(y)  :=  (x); 

10  endif', 

11  end    'i ; 

/*  main  loop  */ 

12  {while   3n  €  (x  e  Dom  Gjx  ^  s  &  ([+:  te{yeG(x) 

(C2(y)  =  0)  }]1  7^  0}) 

13  (Vt  e  Ntinrhs (n) ) 

14  ^2^^^  '"  ^2^^^  "  ^' 

15  end    V; 

16  s:=s+{n}; 

17  end   while; 


Next  we  reduce  C-.;  its  prederivative  relative  to  the 


change  to  c^  at  line  14  of  (14)  is 
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(15)  if   C2(t)  =  1  then 

(Vq  e  {w  e  Dom    G|  t  e  g(w)  }) 

c-.(q)  :=  c-^(q)  +  {t};  /*  strict  set 
^  ^  addition  */ 

end    V  ; 

endif; 

in  which  c_ (t)  =  {w  6  Dom   G  |  t  G  G (w) }  must  be  reduced. 
Since  a  use  of  c^  occurs  within  (15),  c^  cannot  be 
eliminated.   Thus,  the  system  will  request  the  user  to 
supply  a  name  for  c„.   (Let  this  be  Noncnt.)   The 
initializing  code  for  c-,  inserted   just  after  line  11 
of  (14)  is 


(16)  c^  :=    null  set; 

(Vx  e  Dom   G,  y  £  G(x) |Noncnt(y)  =  0) 
if   X  s  Dom    C-.  then 

c-(x)  :=  c-.(x)  +  {y};  else 
c^(x)  :=  {y}; 
endif; 
end    V; 

« 

To  handle  c   ,  which  is   invariant  within  the  while    loop, 
we  only  need  to  define  it  at  the  end  of  the  while    loop 
prologue.   The  code  to  do  this  is 
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(17)  /*  USER  supplies  the  name  Rhstont  for  c   */ 
Rhstont  :=  nullset; 

(Vx  e  Dom   G,  t  €  G(x) ) 

t/  t  s  Dom    Rhstont  then 

Rhstont(t)  :=  Rhstont (t) +{  x} ;  else 
Rhstont (t)  :=  {  x} ; 
endif; 
end    V  ; 

The  current  stage  of  reduction  ends  after  we  replace  the 
setformer  {w  g  Dom   G  |  t  G  G (w) }   occurring  in  (15)  by 
the  map  retrieval  operation  Rhstont (t). 

Proceeding  now  to  the    next  outer  expression  c. 
which  contains  c^  ,  we  note  that  the  final  form  of  the 
prederivative  of  c.  relative  to   the  change  to  c,  occurring 
in  (15)  is 

(18)  c^(q)  :=  c^(q)  +  1; 

Since  (18)  contains  no  uses  of  c^  we  can  replace  the 
assignment  to   C-,(q)  in  (15)  by  the  assignment  (18).  After 
this  we  can  replace  (16)  by  the  following  code  which 
initializes  c.  , 

(19)  c .  : =  nul Iset; 

(Vx  G  Dom   G,  y  G  G(x)  |  Noncnt(y)  =  0) 

■if  X  G  Dom    c.    then 

c .  (x)  :=  C4  (>^)  +  1/*  &tse 

c^ (x)  :=  1; 

endif; 
end    V ; 
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This  makes  the  workset  c^   =   {x  g  Dom   G | x^s  &  c.ix)^0} 
ready  for  reduction.   The  prederivative  of  Cj-  relative  to 
the  change  (18)  in  c.  is 

(20)  if   c.(q)  =  0  &  q  G  Dom    G  &  q  ^  s  then 


c^    :=  c^  +  { q} ; 


endif; 


while  the  prederivative  of  c   relative  to  the  strict 
element  addition  s  :=  s  +  {n}  is 


(21)  (Vy  G  {n}  I  y  G  Z)om  G  &  c^  (y)  7^  0) 

c^  :=  c^  -  {y}; 
end    V ; 

which  can  be  easily  simplified  to 

(21')         if   c^(n)  7^  0  then 

c^    :=  Cc  -  {n}; 
endif; 

Since  we  cannot  eliminate  c,  ,  we  will  refer  to  it  by  a 
new  user  supplied  name,  Ntcnt.   c_  which  the  user  has 
named  W  can  be  made  available  on  entry  to  the  while    loop 
by  executing  w  :=  {x  g  Bom   G|x  ^  s  &  c.(x)  ^    0 } ; 

The  result  of  this  last  series  of  transformations 
is  the  following  efficient  low  level  version  of  (6), 
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(22)  1  s  :=  (n  G  Dom   G  |  A  G  g  (n)  }; 
/*  prologue  */ 

2  (Vx  G  Dom   G,  y  G  G(x) ) 

3  Noncnt(Y)  :=  [  +  :  zGy  |  z^s]l; 

4  end    V; 

5  Ntinrhs  :=  nullset; 

6  (Vx  G  Dom   Noncnt,  y  G  x) 

7  if  Y   ^    Dom   Ntinrhs  then 

8  Ntinrhs (y)  :=  Ntinrhs (y) +{ x} ;  else 

9  Ntinrhs (y)  :=  { x } ; 

10  endif; 

11  end    V; 

12  Ntcnt  :=  nullset', 

13  (Vx  G  Dom  G,  y  G  G(x)  ]  Noncnt(y)  =  0) 

14  i/  X  G  Dom   Ntcnt  then 

15  Ntcnt(x)  :=  Ntcnt(x)  +  1;  else 

16  Ntcnt(x)  :=  1; 

17  endif; 

18  end    V; 

19  Rhstont  :=  nullset; 

20  (Vx  G  Dom    G,  t  G  G(x) ) 

21  if   t   &    Dom    Rhstont    then 

22  Rhstont(t)     :=    Rhstont(t)    +{  x]  ;    else 

23  Rhstont  (t)     :=  {  x}  ; 

24  endif; 

25  end    V; 

26  W    :=   {x   G    Z)om    Gjx  ^    s    &    Ntcnit(x)    7^    0 }  ; 
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/*  main  loop  */ 

27  [while   3  n  e  W) 

28  ( Vt  e  Ntinrhs (n) ) 

29  if   Noncnt(t)  =  1  then 

30  (  Vq  e  Rhstont(t)  ) 

31  if   Ntcnt(q)  =  0  &  q  ^  s  then 
3  2  W  :=  W  +  {q}; 

33  endif; 

34  Ntcnt(q)  :=  Ntcnt(q)  +  1 ; 

3  5  end    'i ; 

36  endif; 

37  Noncnt(t)  :=  Noncnt(t)  -  1; 

38  end    V; 

39  if   Ntcnt(n)  j^    0  then 

4  0  W  : =  W  -  {  n }  ; 
41              endif; 

4  2  s  :=  s  +  {n}; 

43       end   while; 

Observe  that  (22)  represents  an  improvement  over  (6) 
only  when  the  number  m  of  nonterminals  (in  G)  from  which  the 
empty  string  can  be  derived  is  large.  If  we  let 

^   =  I  I  #  t 

n  e  Dom    G   t  G  G(n) 

then  the  expected  cost  of  executing  (6)  is  proportional 

to  £  X  m,   while  the  expected  cost  of  (22)  is  0(£) .   In 

analyzing  (22),  we  see  that  the  preprocessing  cost  is  much 
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more  expensive  than  the  while    loop.   Indeed,  I   elementary 
steps  are  consumed  by  the  code  from  lines  2  to  10  of  (22) 
where  the  maps  Noncnt  and  Ntinrhs  are  computed.  The  time 
cost  of  the  remaining  code  within  the  prologue  is  bounded 
by  #G .   However,  the  expected  cost  of  the  while    loop  is 

only  proportional  to   J   #Ntinrhs(n)   where  s  is  the  set 

nGs 
of  nonterminals  in  G  and  Ntinrhs (n)  is  the  set  of  all  right- 
hand  side  terms  which  contain  the  nonterminal  symbol  n. 

In  comparing    the  preceding   example  with  the  bubble 
sort  example,  we  cannot  help  but  note  that  despite  the 
potential  improvement  in  transformational  mechanization 
suggested  by  our  derivation  of  the  grammar  algorithm, 
neither  example  shows  any  overwhelming  speedup.  The  next 
two  examples  will  exhibit  much   greater  degrees  of  speedup. 

In  Chapter  4  (D)   we  obtained  an  order  of  magnitude 
speedup  by  formally  differentiating  a  restricted  version 
of  Haberman ' s  Banker's  Algorithm.   In  the  following  example, 
we  apply  FD  to  an  unrestricted  version  of  the  Banker's 
Algorithm  and  realize  a  logarithmic  speedup  in  general,  and 
an  order  of  magnitude  improvement  if  the  cost  of  preproces- 
sing can  be  neglected. 

The  general  Banker's  Algorithm  considers  a  bank  with 
several  kinds  of  currency.   For  each  kind   i  of  currency  R, 
cash(i)  represents  the  total  amount  of  this  currency 
controlled  by  the  bank;  loan(i,c)   is  the  loan  of  type  i 
currency  presently  out  to  customer  c;  claim (i,c)  is  a 
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customer's  claim  for  type  i  currency.  The  strategy  of  the 
general  algorithm  is  the  same  as  that  of  the  simplified 
version  presented  in  Chapter  4;  i.e.,  any  customer  whose 
full  claim  can  be  met  by  the  bank  can  be  satisfied.  But 
since  we  now  have  R  different  currencies,  the  bank  can 
only  meet  the  demands  of  a  customer  c  if  the  predicate 
Vi  G  R|claim(i,c)  <_   cash(i)  holds. 

A  base  form  version  of  the  full  Banker's  Algorithm 
can  be  written  as  follows: 

(23)  1        {while   3c  s  Cus|(Vi  e  R|claim(i,c)  <_  cash(i)) 

2  (Vi  e  R) 

3  cash(i)  :-   cash(i)  +  loan(i,c); 

4  end    M ; 

5  cus  : =  cus  -  {  c}  ; 

6  end   while; 

2 

This  executes  in  time  proportional  to  (#cus)   x  #R  on 

the  average.   In  order  to  differentiate  (23)  we  must  first 
transform  it  into  a  more  convenient  form.  This  can  be  done 
by  applying  the  following  sequence  of  transformations  to 
the  while    loop  predicate:  P3  and  P13  of  Appendix  D  (V), 
R6,  R4,  and  R5   of  Appendix  D  (ii),  and  finally  P6  of 
Appendix  D  (v) .   The  code  which  results  is 
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(24)  1  {while    3c  e  { u  e  Cus|[+:  yG{ iGR | claim (i , u)  > 

cash(i)  }]1  =  0}) 

2  (Vi  e  R) 

3  cash(i)  :-   cash(i)  +  loan(i,c); 

4  end    V; 

5  cus  :=  cus  -  {c}; 

6  end   while; 

This  form  of  the  Banker's  Algorithm  is  especially 
amenable  to  FD,  since  only  one   user  directive, 

(25)  $FD,l,Gcus    =    {uecus|[+:    y   G    {i    e    R    | 

claim(i,u)    >   cash(i)}]l   =    0}, 

is  needed  to  speed  up  (24) .   The  program  analysis  applied 
to  process  the  command  (25)  will  recognize  that  the 
expressions  c, (u)  =  { iGR | claim (i ,u)  >  cash(i)}, 
C2(u)  -    [  +  :  y  G  c-,  (u)  ]  1  ,  and  c,  =  {uGcus|Cp(u)  =  0} 
are  all  reducible.   The  reduction  procedure  will  handle  c-, 
by  first  differentiating  the  innermost  subexpression  c, 
of  c^.   The  prederivative  code  for  c,  with  respect  to  the 
change  to  cash(i)  at  line  3  of  (24)  is 

(26)  (while  xmin(i)  <  cash(i)  +  loan(i,c)) 

(Vx  G  {w  G  Dom   claim{ i} I  claim (i ,w)  -    xmin(i)}) 

c  (x)  :=  c, (x)  -  {i};     /*  strict  deletion  */ 
end    V; 

xmin(i)  :=  succ (i , xmin ( i) ) ; 
end  while; 
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The  auxiliary  maps  xmin  and  succ  occurring  in  (26)  are 
initialized  by  executing  the  following  code, 

(27)       c,  :=    null  set; 

(Vx   e    R,    t   G    Dom    clainrf  x} I claim(x, t)     >    cash(x)) 

if   t   £    Dom    c,     then 
■'  1 

c,  (t)  :=  c-(t)  +  {x};  else 
c ,  ( t )  :  =  {  x  }  ; 
endif 
end    V  ; 

(Vx  G  R)         /*  sort  claimCx}  and  produce  succ (x) */ 
sortas  (Dom  claimfx},  x) 
xmin(x)  :=  [mini    y^Dom   claim{  x}  | 

claim(x,y)  >  cash (x) ] claim  (x,y) ; 
end    V  ; 

just  prior  to  line  (1)  of  (24). 

However,  for  (26)  to  be  profitable,  we  must  reduce  the 
costly  set  former  c.(i)  =  {  w€Z)om  claim!  i}  |  claim  (i  ,w)  =xmin  (i)  }  . 
c.    depends  discontinuously  on  xmin(i)  and  on  i.  Unfortun- 
ately, a  general  expression   of  the  form 

c(q-L.q2'q3)  "  iwj^F-^iq^)  \F^{q^,^fi)=q^}    to  which  c^  can  be  matched 
will  not  usually  be  profitably  reducible.  The  cost  of 
initializing  c  is   exorbitant;  this  can  be  observed  by 
inspection  of  the  following  initialization  code. 
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(28 )  c  :=    null  set; 

(Vt   e  Dom    F^,  t^  G  Bom    F^ ,  w  e  F^(t^)) 

if    [t, ,t„ ,F  (t_ ,w) ]  e  Project{c,3)  then 
c(t^,t2,F2 (t2,w) ) 

:=  c  (t,  ,  t2  ,F  (t^ /W)  )  +  {w};  else 
c(t^,t2,F2 (t^rw) )  :=  {  w} ; 

end    V  ; 

This  code  executes  in  time  proportional  to 
#  {Dom   F-.  )  X  #  {Dom   F  )  x  n,  where  n  is  a  uniform  bound 
on  #F  (t^ ) .   However,  when  the  discontinuity  parameters 
q^  and  q„  in  c   can  both  be  replaced  by  2  occurrences  of 
a  single  discontinuity  parameter  q,  (28)  can  be  improved 
to  the  following  more  efficient  code, 

(29)  c  :=    null  set; 

(Vt  e  Dom    F  ,  w  6  F  (t  ) [t  G  Dom    F  ) 

if    [t,F  (t,w)]  G  Project(c,2)  then 

c{t,Y^{t,^))     :=    c(t,F2  (t,w)  )+{w}  ;  else 
c  (t,F2  (t,w)  )  :=  {w}; 
endif; 
end    V  ; 

Note  that  (29)  is  formed  from  (28)   by  eliminating  the 
second  component  of  c,  and  also  by  turning  the  iterator 
t   G  Dom   F   within  (28)  into  a  membership  test  in  (29). 
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This  technique  can  be  generalized  to  a  rule  for  efficiently 
handling  expressions  which  depend  on  redundant  discontinuity 
parameters.   Such  a  rule,  if  incorporated  into  the  imple- 
mentation design  of  Chapter  4,  will  significantly  expand 
the  contexts  in  which  FD  can  be  profitably  applied. 

In  the  case  of  c.  ,  xmin(i)  and  i  are  the  only  para- 
meters on  which  c.  depends  and  which  undergo  modifications 
within  the  while    loop  beginning  at  line  1  of  (24) . 
Thus,  to  reduce  c.  we  only  need  to  initialize  it  at  loop 
entry.   The  code  to  do  this  is  based  on  (29)  and  is, 

(30)  c.  :=  nullset ; 

(Vt  G  Bom   claim,  w  ^  Dom   claim  {  t}  |  t  s  Bom   claim) 
if    [t,  claim(t,w)]  G  Project (c. ,2 )  then 
c . (t , claim (t,w) )  := 

c . (t,claim(t ,w) )  +  {w};  else 
c. (t,  claira(t,w))  :=  {w}; 
endif ', 
end    V  ; 

where  the  membership  test   t  6  DOM   claim  appearing  in 
line  2  above  can  be  removed  as  superfluous. 

The  next  step  in  reducing  Gcus  requires  reduction  of 
c_.   The  prederivative  of  c„  relative  to  the  change 
c,  (x)  :=  c,(x)  -{  i}   within  (26)  is 

(31)  C2(x)  :=  C2(x)  -  [+:  v  6  {i}]l; 
which  simplifies  to 
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(31')  c^ (x)  :=  c^  (x)  -  1; 

Since  (31')  shows  that  c„  does  not  depend  on  c,  ,  all 
definitions  to  c,  in  the  program  are  dead.  Thus,  we 
can  replace  the  change  to  c,  in  (26)  by  (31')  and  the 
initializing  code  (27)  by  the  following  code, 

(32)       c„  :=    null  set; 

(Vx  e  R,  t  G  Dom   claim  {  x}  |  claim(x,t)  >  cash(x)) 
if  t  ^    Bom    c„  then 

c„(t)  :=  c„(t)  +  1;  else 
C2  (t)  :-  1; 
endif; 
end    V; 
(Vx  G  R) 

soTta.s{dom   claim  {  x}  ,  x)   /*  sort  claim  {  x]  */ 
xmin(x)  :=  [min :    y  £  Dom   claim  {x}|y  >  cash(x)]y; 
end   V  ; 

Finally  c-,  must  be  reduced.   The  prederivative  of  c^ 
relative  to  the  change  (31')  is 


(33)       t/  X  e  Cus  &  c„ (x)  =  1  then 


c       :=c^+{x}; 

endif 

where  the  membership  test  x  e  Cus  within  (33)  can  be 
eliminated.   The  prederivative  code  for  c^  relative  to 
the  element  deletion   Cus  :=  Cus  -  {c}  is 
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(34)  (Vy  G  {c}  I  c^Cy)  =  0) 

c  2  :  =  c  3  -  {  y }  ; 
end    V  ; 

which  after  cleanup  becomes  the  following  equivalent  code, 

(34')  C3  :=  C3  -  {c}; 

Since  (33)  contains  a  use  of  c^  ,  c„  must  remain  in  the 
program  under  a  user  supplied  name,  say  Count.   A  last 
reduction  step  inserts  the  assignment  Gcus  :-   {  xGCus | Count (x) =0 } 
at  the  end  of  the  prologue. 

The  low  level  SETL  version  of  the  Banker's  Algorithm 
derived  from  (23)  is  as  follows: 

(35)  /*  Prologue  */ 

1  Count  :-    nullset  ; 

2  (Vx  e  R,  t  e  Dom   claim  {  x} | claim (x,t)  >  cash(x)) 

3  if  t   ^    Dom   Count  then 

4  Count (t)  :=  Count (t)  +  1;  else 

5  Count (t)  :=  1; 

6  endif; 

7  end    V; 

8  (Vx  e  R) 

9  sortas (Dom   claim  {  x} ,  x)  ; 

10  xmin(x)  :=  [mini    y  G  Dom   claim  {  x} | y>cash (x)  ] y ; 

11  end    V; 
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12  Ycus  :=  nullset;      /*  Ycus  replaces  the  name  c*/ 

13  (Vt  G  Dom   claim,  w  s  Uom    claimltl) 

14  if    [t,  claim(t,w)]  e  Pro ject (Ycus ,  2)  then 

15  Ycus(t,  claim(t,w))  :-      Ycus ( t ,claim ( t,w) ) 

+  {w};  else 

16  Ycus (t, claim (t,w) )  :=  {w}; 

17  endif; 

18  end  V; 

19  Gcus  :=  {x  e  Cus  |  Count (x)  =  0}; 
/*  Main  loop  */ 

20  [while   3  c  G  Gcus) 

21  (Vi  e  R) 

22  {while    xmin(i)  <  cash(i)  +  loan(i,c)) 

23  (Vx  G  Ycus(i,  xmin(i))) 

24  i/  count (x)  =  1  then 

25  Gcus  :=  Gcus  +  {x}; 

26  endif; 

27  Count (x)  :=  Count  (x)  -  1; 

28  end  V; 

29  xmin(i)  :=  succ(i,  xmin(i)); 

30  end   while; 

31  cash(i)  :-   cash(i)  +  Loan(i,c); 

32  end  V  ; 

33  Gcus  :=  Gcus  -  {c}; 

34  end   while; 
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Note  that  the  dead  assignment  Cus  :=  Cus  -  {c}  has  been 
eliminated  just  after  line  33  of  (35). 

In  analyzing  the  expected  running  time  of  (35)  we 
will  make  the  following  assumptions: 

1.  Vx  G  R  I  Dom    claim{x}  c  Cus 

2.  Dom    claim  c  R 

We  can  then  estimate  that  the  cost  of  executing  the  prologue 
of  (35)  will  take  no  more  than  0(#R  x  #Cus  x  log  #Cus) 
elementary  steps.   This  is  the  cost  of  the  loop  appearing 
in  lines  8-11;  the  other  loops  within  the  prologue  require 
no  more  time  than  either  #R  x  #Cus   or  #  Cus  steps.   The 
main  loop  itself  should  run  in  time  proportional  to  #Rx#Cus 
at  most. 

The  preceding  techniques  for  eliminating  redundant 
discontinuity  parameters  also  prove   useful  in  the  next 
example,  a  form  of  Kildall's  iterative  algorithm  [KI2]  for 
computing  expressions  available  for  a  program  flow  graph. 
Input  for  this  algorithm  consists  of  the  following, 

1.  the  set  N„  of  nodes  in  the  flow  graph,  where  each  node 
corresponds  to  a  basic  block  of  the  program; 

2.  the  set  CV  of  potentially  available  expresions; 

3.  a  map  pred  which  maps  each  node  n  e  N^  into  the  set 

r 

pred (n)  of  predecessor  nodes  in  the  flow  graph. 

4.  a  preserved  set  map  PR  which  maps  each  node  n  G  Np 
into  the  set  PR(n)  of  all  expressions  e  G  CV  in  which 
there  occur   no  definitions  to  parameters  of  e  within  n. 
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5.    an  exposed  definitions  map  XE  which  associates  each 

node  n  G  N„  with  the  set  XE(n)  of  all  expressions  e  whose 
r 

value  is  saved  in  a   temporary  o   by  an  assignment  o   :=  e 
occurring  within  n  at  a  place  after  which  a   is  not  spoiled 
in  n. 

The  algorithm  will  output  the  set  AE(n)  of  expressions 
available  at  the  top  of  each  node  n  g  n_ . 

r 

In  SETL  a  base  form  version  of  the  available  expressions 
algorithm  is 


(36)  AE  :=  nullset; 

(Vn  e  Np) 

AE  (n)  :=  CV; 
END    V; 
[while   3n  G  N^  |  AE  (n)  7^  [  *  :  yGpred(n)  ]  (  (AE(y)  *PR(y)  ) 

+  XE(y)  )  ) 
AE(n)  :=  [*:  y  G  pred (n) ] ( (AE (y ) *PR (y ) ) +XE (y ) ) ; 

end    while; 


In  order  to  improve  (36)  by  FD ,  we  must  first  take  several 
manual  steps  to  transform  (36)  into  the  following  canonical 
form, 

(37)     1  AE: :-    nullset; 

2  (Vn   G    N^) 

3  AE(n)     :=    CV; 

4  end    1 ; 

5  (u/ziZ-e   3  nG{mGN    I  [+:yG{  zGAE  (m)  I  [+:    we{  xGpred  (m)  | 

(z    9-    AE(x)     or    z^PR(x))     &    z^XE(x)}]l 
7^    0}]1    jt    0}) 

6  AE(n)  :=  AE (n) -{zGAE (n) I [+:  wG{ xGpred (n) | 

(z^AE(x)  or    z^PR(x))  &  z^XE(x)}]l  ^    0); 

7  end   while; 
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within  the  loop  appearing  in  lines  5-7  of  (37),  there 
occur  five  different  reducible  expressions;   these  are 
c  (m,z)  ={xG  pred(m)  I  (z^AE(x)  ov    z^PR(x))  &  z^XE(x)}, 
c^  (m,z)  =  [  +  :  wec^  (m,z)  ]1,   c^  (m)  =  {  zSAE  (m)  |c2  (ra,z)  i^    0}  , 
c  •  (m)  =  [  +  :  yGc-  (m)  ]  1 ,  and   c_  =  {mS  N_|c.{m)  7^  0}. 

4  J  D  r    4 

(Note  that   c,  is  an  instance  of  a  general  reducible  form 
not  currently  included  within  our  FD  tables.) 

The  expressions   c^  ,  c„  ,  and  c-.  occur  at  both  lines  5  and  6, 
while  the  other  reducible  expressions  occur  only  at  line  5. 
Unfortunately,  we  are  unable  to  apply  FD  to (37)  successfully, 
because  the  prederivative  code  for  updating  c,  ,  c„  ,  and  c^ 
would  be  executed  prior  to  line  6,  thus  spoiling  the  old 
values  for  c^  ,  c„  ,  and  c^  which  we  need  at  line  6. 

A  remedy  for  this  can  be  worked  out  by  decomposing 
the  assignment  at  line  6  into  the  following  forall  iterator. 


(38)       (Vy  G  {  z  e  AE (n)  |  [+:  we{ xGpred(n)  |  (z^AE (x) 

ov    7.   f   PR(x))  &  z  ^  XE(x)}]l  7^  0}) 
AE(n)  :=  AE(n)  -  {y}; 
end   V ; 

r 

After  this  the  algorithm  can  be  differentiated  with  one 
user  directive. 


(39)    $FD,5,W  =   {  m  G   N^l  [  +  :    y  G   {  zGAE (m)  I  [+:    wG{  xGpred(m)     1 

r 
(z   f   KE.U)    ov    z   ^   PR(x))     &    z^XE(x)  }]  17^0}]  17^0} 
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In  order  to  reduce  W,  its  innermost  reducible 
subexpression  c,  must  be  reduced  first.   Since  c,  contains 
three  occurrences  of  the  same  discontinuity  parameter  z, 
we  can  apply  the  techniques  of  the  previous  case  study  to 
improve  the  derivative  code  for  c,  relative  to  the  change 
AE(n)  :=  AE-  {y}   occurring  within  (38).   This  involves 
replacement  of  potentially  costly  iterations  over  PR{x) 
and  over  the  range  CV  of  possible  values  for  z  by  simple 
membership  tests.   In  raw  form  before  cleanup  the  prederi- 
vative  code  for  c,  is 

(40)  (Vz  e  {u  G  Dom   pred  |  n  pred(u)},xe  {y}|xePR(n)  &x^XE(n)}) 

c,  (z,x)  :=c,(z,x)  +{n}; 
end    V  ; 

where  it  is  required  that  succ(n)  =  {u  s  Dom   pred | nGpred (u) } 
should  be  reduced.   To  reduce  succ ,  we  have  only  to  place 
the  following  initialization  code 

(41)  succ  :=  nullset  ; 

(Vu  e  Dom   pred,  n  G  pred(u)) 
if   n  G  Dom    succ  then 

succ(n)  :=  succXn)  +  {u};  else 
succ  (n)  :=  {  u}  ; 
endif; 
end    V  ; 

just  before  line   5  of  (37).   After  straightforward 
application  of  standard  (-leanup  transformations,  (40)  can 
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be  rewritten  in  the  following  efficient  form, 

(42)  if   Y   ^    PR(n)  &  y  ^  XE(n)  then 

( Vz  e  succ (n) ) 

c,  (z,y)  :=  c,  (z,y)  +  { n} ; 
end    V  ; 

endif; 

To  complete  the  reduction  of  c, ,  we  place  the  following 
code  within  the  loop  prologue: 

(43 )  ^^  '~   nullset ; 

(Vm  e  dom   pred,  x  s  pred(m) ,  z  e  AE (m)  [ 

(z  ^  AE(x)  or    z^PR(x))  &  z^XE (x) ) ) 
if    [m,z]  G  PROJECT(c, ,2)  then 

C-.   (m,z)  :=  c,  (m,z)  +  {  x}  ;  else 
c^ (m,z)  :=  {  x} ; 
endif ; 
end    V  ; 

The  next  reduction  step  involves  c  .   Its  prederiva- 
tive  relative  to  the  change  in  c,  within  (42)  is  simply 

(44)  C2(z,y)  :=  C2(z,y)  +  1; 

which  makes  c,  useless  within  the  loop.   Hence,  we  replace 
the  assignment  to  c,  in  (42)  by  (44),  and  within  (43)  we 
first  differentiate  c„  relative  to  the  changes  to  c,  and 
then  eliminate   all  code  which  refers  to  c, .   The  code 
which  replaces  (43)  is  then. 
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(45)  c„  :=  nullset) 

(Vm  G  Dom   pred,  x  e  pred (m) ,  z  G  AE (m) 

(z  ^  AE(x)  or  z  ^  PR(x))  &  z  ^  XE(x)).) 
■£/■  [m,z]  e  Project  (c^  ,2)  then 

c_(m,z)  :=  c„(m,z)  +  1;  else 
c^  (m,z)  :=  1; 
endif', 
end    V  ; 

The  next  expression  to  reduce  is  c-..   Its  prederiva- 
tive  relative  to  the  change  (44)  to  c„  is 

(46)  if   c    (z,y)  =  0  then 

c^(z)  :=  c^(z)  +  {y}; 
endif', 

Its  prederivative  relative  to  the  change  AE(n)  :=  AE(n)-  {y} 
is  just 

(47)  C3(n)  :-  C3  (n)  -   (y)  . 

Note  in  connection  with  (46)  that  the  variable  z  within 
(46)  cannot  have  the  same  value  as  n.   This  determination 
requires  relatively  simple  reasoning  (lending  support  to 
the  possibility  of  proving  this  fact  automatically) ,  yet 
it  is  crucial  to  the  entire  reduction.   After  initializing 
C3  by  inserting  the  code 
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(48)       c^  :=    nullset; 


(Vm  e  Dom   AE ,  z  G  AE  (m)  |c  (m,z)  7^  0) 

t/  m  G  Dom  c  then 

c  (m)  :=  c^(m)  +  {  z } ;  else 
c^ (m)  :=  {  z} ; 
end-i/; 
end    V  ; 

at  the  end  of  the  prologue,  we  can  replace  uses  of  the 
expression  c^  by  a  map  retrieval  opeation  using  the  vari- 
able c^. 

All  this  will  have  put  the  available  expression 
algorithm  into  the  following  transitional  form, 

(4  9)       1  AE    :=   nullset J 

2  (Vn  e  N^) 

3  AE(n)  :=  CV; 

4  end    V ; 

5  numpred  :=  nullset ',      /*    the  user  name  for  c„  */ 

6  (Vm  e  Dom   pred,  x  G  pred(m),  z  G  AE (m)  | 

(z  ^  AE(x)  ov    7.   ^    PR(x))  &  z  ^  XE(x)) 

7  if    [m,z]  s  Pro ject  (numpred,  2)  t/zen 

8  numpred(m,z)  :=  mnumpred (m, z ) +1 ;  else 

9  numpred (m,z)  :-    1; 

10  endif', 

11  end  V; 

12  succ  :=  nullset  ; 
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13  (Vu  G  Bom    pred ,  n  e  pred (u) ) 

14  if   n  £  Dom    succ  then 

15  succCn)  :=  succ(n)  +  {u};  else 

16  succ (n)  :=  {  u} ; 

17  endif; 

18  end    V; 

19  Del  :=  nullset;      /*  Del  stands  for  c^  */ 

20  (Vm  e  Dom   AE,  z  e  AE  (m)  |  numpred  (m,  z )  7^  0) 

21  if  m  ^    Bom   Del  t/2en 

22  Del (m)  :=  Del (m)  +  {z};  else 

2  3  Del(m)  :=  {  z }  ; 

24  endif; 

25  end  V; 

/*  main  loop  */ 

26  {while   3  n  G  {  m  e  N^|  [  +  :y  e  Del  (m)  ]  1  7^  0}) 

r 

27  (Vy  G  Del(n) ) 

28  'i-f   Y   ^    PR(n)  &  y  ^  XE(n)  t?zen 

29  (Vz  G  succ(n)  ) 

30  if   numpred (z,y)  =  0  then 

31  Del(z)  :=  Del(z)  +  (y); 

32  endif; 

3  3  numpred (z,y)  :=  numpred (z ,y) +1 ; 

34  end    V; 

35  endif; 

36  Del(n)  :=  Del  (n)  -  {y}; 

37  AE(n)  :=  AE(n)  -  {  y}  ; 

38  end    V  ; 

39  end   while; 
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Reduction  continues  with  handling  of  c . (m) = [+ lyGDel (m) ] 1 
Just  prior  to  lines  31  and  36  where  Del  is  modified  we  must 
insert  the  following  prederivative  code  for  c.  , 

(50)  c  (z)  :=  c  (z)  +  1;      /*  BEFORE  line  31  */ 
and 

(50')      c^(n)  :=  c^ (n)  -  1;      /*  BEFORE  line  36  */ 

c.    may  also  be  initialized  incrementally  within  the 
initializing  code  for  Del  at  lines  19-25.  That  is,  just 
prior  to  the  assignment  Del  :=  nullset    at  line  19  we  place 
the  assignment  c.  :=  nullset ;         Then  just  before  lines  22 
and  2  3   we  insert  the  incremental  code 

(51)  c^ (m)  :=  c^ (m)  +  1; 
and 

(51')      c^(m)  :-l; 

respectively . 

This  brings  the  reduction  process  into  a  final  stage 
in  which  the  workset   W  =  (n  e  N_|c.(m)  ^   0}   can  be  reduced. 

r    4 

W  depends  only  on  the  definitions  (50)  and  (50')  which  occur 
within  the  optimization  loop.   The  prederivative  code  for  W 
relative  to  these  definitions   are 

(52)  if   c^  (z)  =  0  then 

W  :  =  W  +  {  z  }  ; 
endif; 

with  respect  to  (50)  and 

(52')  if   c^  (n)  -  1  then 

W  :  =  W  -  {  n }  ; 
endif 
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with  respect  to  (50').   It  is  clear  from  both  (52)  and 

(52')  that  c.  must  be  included  in  our  final  algorithm. 

Thus,  we  will  supply  a  name   numdif  to  replace  c..  The 

last  task  to  perform  inserts  the  assignment 

W  :=  {m  G  N_  I  numdif  (m.)  7^  0 }   at  the  end  of  the  prologue. 
F 

Collecting  the  results  of  all  of  the  previous  trans- 
formations together  we  arrive  at  the  following  low  level 
SETL  version  of  (36) . 

(53)  1     KE    :=   nulls  et', 

2  (Vn  e  Np) 

3  AE(n)  :=  CV; 

4  end    V; 

/    /*  prologue  */ 

5  numpred  :=  nullset ; 

6  (Vm  e  BOM   pred,  x  e  pred(m) ,  z  e  ae (m)  ] 

(z  9   AE(x)  or    z    ^    PR(x))  &  z  ^  XE(x)) 

7  if    [m,z]  G  Project (numpred, 2)  then 

8  numpred (m,z)  :=  numpred (m, z) +1 ;  else 

9  numpred (m,z)  :=  1; 

10  endif', 

11  end   V ; 

12  succ  :=  nullset  ', 

13  (Vu  G  Dom   pred,  n  G  pred(u)) 

14  if  n  &    Dom   succ  then 

15  succ(n)  :=  succ(n)  +  {  u} ;  else 

16  succ (n)  :=  {  u} ; 

17  endif; 

18  end    V; 

19  numdif  :=  nullset ; 

20  Del  :=  nullset', 

21  (Vm  G  Dom   AE,  z  G  AE  (n)  |  numpred  (m,  z)  7^  0) 

22  i/  m  G  Dom   Del  then 

23  numdif (m)  :=  numdif (m)  +  1; 
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24 

Del  (in)  :=  Del  (m)  +  {z};  else 

25 

numdif(m)  :-    1; 

26 

Del(m)  :=  {  z}; 

27 

end 

ifr 

28 

end 

V; 

29 

W  :■■ 

=  {m 

€  Np  numdif(m)  7^  0 }  ; 

/*  : 

main 

loop  */ 

30 

{wh 

He 

new) 

31 

(Vy 

e  Del(n) ) 

32 

if  y   ^    PR(n)  &  y  ^  XE(n)  then 

33 

(Vz  G  succ (n) ) 

34 

if   numpred(z,y)  =  0  then 

35 

if   numdif(z)  -    0  then 

36 

W :  =  W  +  {  z  }  ; 

37 

endif; 

38 

nuindif(z)  :=  numdif  (z) +1  ; 

39 

Del(z)  :=  Del(z)+  {  y}; 

40 

endif ; 

41 

numpred (z ,y)  :=  numpred (z ,y ) +1 ; 

42 

end    V; 

43 

endif; 

44 

if   numdif (n)  =  1 

45 

W  :  =  W  -  {  n }  ; 

46 

endif; 

47 

numdif (n)  :=  numdif (n)  -  1 ; 

48 

Del(n)  :=  Del (n)  -  { y} ; 

49 

AE(n)  :=  AE(n)  -  {y}; 

50 

end 

Vy  ; 

51 

end 

whil( 

3  ; 

It  is  easily  seen  that  our  final  version  of  the  available 
expression  algorithm  executes  in  fewer  than  0(#CV  x  #SUCC) 
elementary  steps,  a  considerable  savings  over  the  base  form 
algorithm,  (36) . 
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